[jira] [Resolved] (FLINK-8572) Flink1.4 version of the document on the left menu click can not jump, the left drop-down menu can not be pulled

2018-02-06 Thread PengYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengYang resolved FLINK-8572.
-
  Resolution: Fixed
Release Note: I have found the problem is due to our company made some of 
Google's visit to the restricted access can not be caused

> Flink1.4 version of the document on the left menu click can not jump, the 
> left drop-down menu can not be pulled
> ---
>
> Key: FLINK-8572
> URL: https://issues.apache.org/jira/browse/FLINK-8572
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: PengYang
>Priority: Major
> Attachments: Q%_B9)W$@~YVZN~9...@zo4.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]
> Flink1.4 version of the document on the left menu click can not jump, the 
> left drop-down menu can not be pulled
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8572) Flink1.4 version of the document on the left menu click can not jump, the left drop-down menu can not be pulled

2018-02-06 Thread PengYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengYang updated FLINK-8572:

Description: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]

Flink1.4 version of the document on the left menu click can not jump, the left 
drop-down menu can not be pulled

 

  was:
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]

flink1.4版本的开发文档左边菜单点击之后无法跳转,并且所有的下拉菜单都无法下拉

 


> Flink1.4 version of the document on the left menu click can not jump, the 
> left drop-down menu can not be pulled
> ---
>
> Key: FLINK-8572
> URL: https://issues.apache.org/jira/browse/FLINK-8572
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: PengYang
>Priority: Major
> Attachments: Q%_B9)W$@~YVZN~9...@zo4.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]
> Flink1.4 version of the document on the left menu click can not jump, the 
> left drop-down menu can not be pulled
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8572) Flink1.4 version of the document on the left menu click can not jump, the left drop-down menu can not be pulled

2018-02-06 Thread PengYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengYang updated FLINK-8572:

Summary: Flink1.4 version of the document on the left menu click can not 
jump, the left drop-down menu can not be pulled  (was: Flink1.4版本的document 
左边的菜单点击无法跳转,左边的下拉菜单都无法下拉)

> Flink1.4 version of the document on the left menu click can not jump, the 
> left drop-down menu can not be pulled
> ---
>
> Key: FLINK-8572
> URL: https://issues.apache.org/jira/browse/FLINK-8572
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: PengYang
>Priority: Major
> Attachments: Q%_B9)W$@~YVZN~9...@zo4.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]
> flink1.4版本的开发文档左边菜单点击之后无法跳转,并且所有的下拉菜单都无法下拉
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8572) Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉

2018-02-06 Thread PengYang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355094#comment-16355094
 ] 

PengYang commented on FLINK-8572:
-

Thanks.  I have found the problem is due to our company made some of Google's 
visit to the restricted access can not be caused,Before I tried a variety of 
browsers but it does't work, so I thought it was a mistake in the document. I'm 
really sorry to bother you, thank you!

> Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉
> 
>
> Key: FLINK-8572
> URL: https://issues.apache.org/jira/browse/FLINK-8572
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: PengYang
>Priority: Major
> Attachments: Q%_B9)W$@~YVZN~9...@zo4.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]
> flink1.4版本的开发文档左边菜单点击之后无法跳转,并且所有的下拉菜单都无法下拉
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8573) Print JobID for failed jobs

2018-02-06 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-8573:
---

 Summary: Print JobID for failed jobs
 Key: FLINK-8573
 URL: https://issues.apache.org/jira/browse/FLINK-8573
 Project: Flink
  Issue Type: Improvement
  Components: Client
Affects Versions: 1.5.0
Reporter: Chesnay Schepler


When a job is successfully run the client prints a something along the lines of 
"Job with  successfully run". If the job fails however we only print the 
exception but not the JobID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8567) Maven-shade-plugin can't relocate Scala classes.

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8567.
---
Resolution: Not A Problem

> Maven-shade-plugin can't relocate Scala classes.
> 
>
> Key: FLINK-8567
> URL: https://issues.apache.org/jira/browse/FLINK-8567
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
>  Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (FLINK-8567) Maven-shade-plugin can't relocate Scala classes.

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reopened FLINK-8567:
-

> Maven-shade-plugin can't relocate Scala classes.
> 
>
> Key: FLINK-8567
> URL: https://issues.apache.org/jira/browse/FLINK-8567
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
>  Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8567) Maven-shade-plugin can't relocate Scala classes.

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-8567:

Fix Version/s: (was: 2.0.0)

> Maven-shade-plugin can't relocate Scala classes.
> 
>
> Key: FLINK-8567
> URL: https://issues.apache.org/jira/browse/FLINK-8567
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
>  Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8572) Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉

2018-02-06 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355086#comment-16355086
 ] 

Chesnay Schepler commented on FLINK-8572:
-

[~mingleizhang] Could you translate the issue for us non-chinese speakers?

> Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉
> 
>
> Key: FLINK-8572
> URL: https://issues.apache.org/jira/browse/FLINK-8572
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: PengYang
>Priority: Major
> Attachments: Q%_B9)W$@~YVZN~9...@zo4.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]
> flink1.4版本的开发文档左边菜单点击之后无法跳转,并且所有的下拉菜单都无法下拉
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8516) FlinkKinesisConsumer does not balance shards over subtasks

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355075#comment-16355075
 ] 

ASF GitHub Bot commented on FLINK-8516:
---

Github user tweise commented on the issue:

https://github.com/apache/flink/pull/5393
  
@tzulitai Assignment and restore are orthogonal; this PR doesn't change 
restore. I therefore don't see the need to add another migration test for it. 
There is also no change to how the index is computed by default, it is just 
that part of what happened earlier in `isThisSubtaskShouldSubscribeTo` is now 
in the assigner. Should I add a (trivial) unit test that asserts that 
`isThisSubtaskShouldSubscribeTo` applies modulus to assigner returned value 
that falls outside the subtask index range?


> FlinkKinesisConsumer does not balance shards over subtasks
> --
>
> Key: FLINK-8516
> URL: https://issues.apache.org/jira/browse/FLINK-8516
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2, 1.5.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>
> The hash code of the shard is used to distribute discovered shards over 
> subtasks round robin. This works as long as shard identifiers are sequential. 
> After shards are rebalanced in Kinesis, that may no longer be the case and 
> the distribution become skewed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5393: [FLINK-8516] Allow for custom hash function for shard to ...

2018-02-06 Thread tweise
Github user tweise commented on the issue:

https://github.com/apache/flink/pull/5393
  
@tzulitai Assignment and restore are orthogonal; this PR doesn't change 
restore. I therefore don't see the need to add another migration test for it. 
There is also no change to how the index is computed by default, it is just 
that part of what happened earlier in `isThisSubtaskShouldSubscribeTo` is now 
in the assigner. Should I add a (trivial) unit test that asserts that 
`isThisSubtaskShouldSubscribeTo` applies modulus to assigner returned value 
that falls outside the subtask index range?


---


[jira] [Commented] (FLINK-8572) Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉

2018-02-06 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355036#comment-16355036
 ] 

mingleizhang commented on FLINK-8572:
-

[~LaiAng] You can try another browser to see whether it works.

> Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉
> 
>
> Key: FLINK-8572
> URL: https://issues.apache.org/jira/browse/FLINK-8572
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: PengYang
>Priority: Major
> Attachments: Q%_B9)W$@~YVZN~9...@zo4.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]
> flink1.4版本的开发文档左边菜单点击之后无法跳转,并且所有的下拉菜单都无法下拉
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8572) Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉

2018-02-06 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355032#comment-16355032
 ] 

mingleizhang commented on FLINK-8572:
-

Hi, [~LaiAng] Change the subject to English. And what you mentioned here works 
from my side. So, I don't think it is a bug.

> Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉
> 
>
> Key: FLINK-8572
> URL: https://issues.apache.org/jira/browse/FLINK-8572
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.0
>Reporter: PengYang
>Priority: Major
> Attachments: Q%_B9)W$@~YVZN~9...@zo4.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]
> flink1.4版本的开发文档左边菜单点击之后无法跳转,并且所有的下拉菜单都无法下拉
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8572) Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉

2018-02-06 Thread PengYang (JIRA)
PengYang created FLINK-8572:
---

 Summary: Flink1.4版本的document 左边的菜单点击无法跳转,左边的下拉菜单都无法下拉
 Key: FLINK-8572
 URL: https://issues.apache.org/jira/browse/FLINK-8572
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.0
Reporter: PengYang
 Attachments: Q%_B9)W$@~YVZN~9...@zo4.png

[https://ci.apache.org/projects/flink/flink-docs-release-1.4/#collapse-116]

flink1.4版本的开发文档左边菜单点击之后无法跳转,并且所有的下拉菜单都无法下拉

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8516) FlinkKinesisConsumer does not balance shards over subtasks

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354988#comment-16354988
 ] 

ASF GitHub Bot commented on FLINK-8516:
---

Github user tweise commented on a diff in the pull request:

https://github.com/apache/flink/pull/5393#discussion_r166523133
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisShardAssigner.java
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.kinesis.util;
--- End diff --

I only see references to the partitioner from the package 
org.apache.flink.streaming.connectors.kinesis and not from subpackages - which 
is what I pointed out before. Since you seem to suggest that isn't a 
convention, I will move the class (to me it is just an observation and not 
important).


> FlinkKinesisConsumer does not balance shards over subtasks
> --
>
> Key: FLINK-8516
> URL: https://issues.apache.org/jira/browse/FLINK-8516
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2, 1.5.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>
> The hash code of the shard is used to distribute discovered shards over 
> subtasks round robin. This works as long as shard identifiers are sequential. 
> After shards are rebalanced in Kinesis, that may no longer be the case and 
> the distribution become skewed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5393: [FLINK-8516] Allow for custom hash function for sh...

2018-02-06 Thread tweise
Github user tweise commented on a diff in the pull request:

https://github.com/apache/flink/pull/5393#discussion_r166523133
  
--- Diff: 
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisShardAssigner.java
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.kinesis.util;
--- End diff --

I only see references to the partitioner from the package 
org.apache.flink.streaming.connectors.kinesis and not from subpackages - which 
is what I pointed out before. Since you seem to suggest that isn't a 
convention, I will move the class (to me it is just an observation and not 
important).


---


[jira] [Closed] (FLINK-8567) Maven-shade-plugin can't relocate Scala classes.

2018-02-06 Thread John Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Fang closed FLINK-8567.

Resolution: Fixed

> Maven-shade-plugin can't relocate Scala classes.
> 
>
> Key: FLINK-8567
> URL: https://issues.apache.org/jira/browse/FLINK-8567
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
>  Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8567) Maven-shade-plugin can't relocate Scala classes.

2018-02-06 Thread John Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354883#comment-16354883
 ] 

John Fang commented on FLINK-8567:
--

[~Zentol] Thank you. I have made a mistake. 

> Maven-shade-plugin can't relocate Scala classes.
> 
>
> Key: FLINK-8567
> URL: https://issues.apache.org/jira/browse/FLINK-8567
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
>  Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8553) switch flink-metrics-datadog to async mode

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354788#comment-16354788
 ] 

ASF GitHub Bot commented on FLINK-8553:
---

GitHub user bowenli86 opened a pull request:

https://github.com/apache/flink/pull/5418

[FLINK-8553] switch flink-metrics-datadog to async mode

## What is the purpose of the change

Currently DatadogHttpClient in flink-metrics-datadog send http request 
synchronously. It takes up to 3 sec depending on the network condition, and may 
slow Flink down.

Switching DatadogHttpClient to async mode.

Some benchmarking with averages from 20 rounds:
- with 20 metrics, async took 2 millisec and sync took 150 millisec
- with 200 metrics, async took 5 millisec and sync took 295 millisec

So switching to async will improve the perf by about 50-70X 

## Brief change log

Switching DatadogHttpClient to async mode.

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

none

## Documentation

none

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bowenli86/flink FLINK-8553

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5418


commit 19c48d78702329687584295c0370434318527445
Author: Bowen Li 
Date:   2018-02-07T00:11:18Z

[FLINK-8553] switch flink-metrics-datadog to async mode




> switch flink-metrics-datadog to async mode
> --
>
> Key: FLINK-8553
> URL: https://issues.apache.org/jira/browse/FLINK-8553
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.4.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
> Fix For: 1.5.0
>
>
> Even though currently flink-metrics-datadog is designed as `fire-and-forget`, 
> it's still using sync calls which may block or slow down core. Need to switch 
> it to async mode.
> cc  [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5418: [FLINK-8553] switch flink-metrics-datadog to async...

2018-02-06 Thread bowenli86
GitHub user bowenli86 opened a pull request:

https://github.com/apache/flink/pull/5418

[FLINK-8553] switch flink-metrics-datadog to async mode

## What is the purpose of the change

Currently DatadogHttpClient in flink-metrics-datadog send http request 
synchronously. It takes up to 3 sec depending on the network condition, and may 
slow Flink down.

Switching DatadogHttpClient to async mode.

Some benchmarking with averages from 20 rounds:
- with 20 metrics, async took 2 millisec and sync took 150 millisec
- with 200 metrics, async took 5 millisec and sync took 295 millisec

So switching to async will improve the perf by about 50-70X 

## Brief change log

Switching DatadogHttpClient to async mode.

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

none

## Documentation

none

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bowenli86/flink FLINK-8553

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5418


commit 19c48d78702329687584295c0370434318527445
Author: Bowen Li 
Date:   2018-02-07T00:11:18Z

[FLINK-8553] switch flink-metrics-datadog to async mode




---


[jira] [Commented] (FLINK-8562) Fix YARNSessionFIFOSecuredITCase

2018-02-06 Thread Shuyi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354623#comment-16354623
 ] 

Shuyi Chen commented on FLINK-8562:
---

Hi, [~tzulitai], could you please help take a look at this? Thanks a lot.

> Fix YARNSessionFIFOSecuredITCase
> 
>
> Key: FLINK-8562
> URL: https://issues.apache.org/jira/browse/FLINK-8562
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Major
>
> Currently, YARNSessionFIFOSecuredITCase will not fail even if the current 
> Flink YARN Kerberos integration is failing in production. Please see 
> FLINK-8275.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8559) Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8559.
---
Resolution: Fixed

master: dbb81acb5a1d0f2a9521c6eef7eeb2436bb8004d

> Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to 
> get stuck
> -
>
> Key: FLINK-8559
> URL: https://issues.apache.org/jira/browse/FLINK-8559
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> In the {{RocksDBKeyedStatebackend#snapshotIncrementally}} we can find this 
> code
>  
> {code:java}
> final RocksDBIncrementalSnapshotOperation snapshotOperation =
>   new RocksDBIncrementalSnapshotOperation<>(
>   this,
>   checkpointStreamFactory,
>   checkpointId,
>   checkpointTimestamp);
> snapshotOperation.takeSnapshot();
> return new FutureTask(
>   new Callable() {
>   @Override
>   public KeyedStateHandle call() throws Exception {
>   return snapshotOperation.materializeSnapshot();
>   }
>   }
> ) {
>   @Override
>   public boolean cancel(boolean mayInterruptIfRunning) {
>   snapshotOperation.stop();
>   return super.cancel(mayInterruptIfRunning);
>   }
>   @Override
>   protected void done() {
>   snapshotOperation.releaseResources(isCancelled());
>   }
> };
> {code}
> In the constructor of RocksDBIncrementalSnapshotOperation we call 
> {{aquireResource()}} on the RocksDB {{ResourceGuard}}. If 
> {{snapshotOperation.takeSnapshot()}} fails with an exception these resources 
> are never released. When the task is shutdown due to the exception it will 
> get stuck on releasing RocksDB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8550) Iterate over entryset instead of keys

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8550.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

master: 5e41eaab9c9feff743ca79ddbee11704ceaa2b2d

> Iterate over entryset instead of keys
> -
>
> Key: FLINK-8550
> URL: https://issues.apache.org/jira/browse/FLINK-8550
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
> Fix For: 1.5.0
>
>
> Iterate over entrysets instead of keys when we want to get both key and 
> value. 
> For example, in \{{ProcTimeBoundedRangeOver}}:
> {code:java}
> // code placeholder
> val iter = rowMapState.keys.iterator
> val markToRemove = new ArrayList[Long]()
> while (iter.hasNext) {
> val elementKey = iter.next
> if (elementKey < limit) {
> ...
> val elementsRemove = rowMapState.get(elementKey)
> ...
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7803) Update savepoint Documentation

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-7803:

Affects Version/s: 1.5.0
   1.4.0

> Update savepoint Documentation
> --
>
> Key: FLINK-7803
> URL: https://issues.apache.org/jira/browse/FLINK-7803
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, State Backends, Checkpointing
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Razvan
>Assignee: Razvan
>Priority: Major
> Fix For: 1.5.0
>
>
> Can you please update 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html
>  to specify the savepoint location *MUST* always be a location accessible by 
> all hosts?
> I spent quite some time believing it'S a bug and trying to find solutions, 
> see https://issues.apache.org/jira/browse/FLINK-7750. It's not obvious in the 
> current documentation and other might waste time also believing it's an 
> actual issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-7803) Update savepoint Documentation

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-7803.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

master: f8afc9ff3a60c33fbf39ebdc345b4bb4e4cd9de9

> Update savepoint Documentation
> --
>
> Key: FLINK-7803
> URL: https://issues.apache.org/jira/browse/FLINK-7803
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, State Backends, Checkpointing
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Razvan
>Assignee: Razvan
>Priority: Major
> Fix For: 1.5.0
>
>
> Can you please update 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html
>  to specify the savepoint location *MUST* always be a location accessible by 
> all hosts?
> I spent quite some time believing it'S a bug and trying to find solutions, 
> see https://issues.apache.org/jira/browse/FLINK-7750. It's not obvious in the 
> current documentation and other might waste time also believing it's an 
> actual issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7608) LatencyGauge change to histogram metric

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-7608:

Issue Type: Improvement  (was: Bug)

> LatencyGauge change to  histogram metric
> 
>
> Key: FLINK-7608
> URL: https://issues.apache.org/jira/browse/FLINK-7608
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Reporter: Hai Zhou UTC+8
>Assignee: Hai Zhou UTC+8
>Priority: Major
> Fix For: 1.5.0
>
>
> I used slf4jReporter[https://issues.apache.org/jira/browse/FLINK-4831]  to 
> export metrics the log file.
> I found:
> {noformat}
> -- Gauges 
> -
> ..
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Map.0.latency:
>  value={LatencySourceDescriptor{vertexID=1, subtaskIndex=-1}={p99=116.0, 
> p50=59.5, min=11.0, max=116.0, p95=116.0, mean=61.836}}
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Sink- Unnamed.0.latency: 
> value={LatencySourceDescriptor{vertexID=1, subtaskIndex=0}={p99=195.0, 
> p50=163.5, min=115.0, max=195.0, p95=195.0, mean=161.0}}
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-7608) LatencyGauge change to histogram metric

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-7608.
---
Resolution: Fixed

master: e40cb34868e2c6ff7548653e1e5e2dfbe7d47967

> LatencyGauge change to  histogram metric
> 
>
> Key: FLINK-7608
> URL: https://issues.apache.org/jira/browse/FLINK-7608
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Reporter: Hai Zhou UTC+8
>Assignee: Hai Zhou UTC+8
>Priority: Major
> Fix For: 1.5.0
>
>
> I used slf4jReporter[https://issues.apache.org/jira/browse/FLINK-4831]  to 
> export metrics the log file.
> I found:
> {noformat}
> -- Gauges 
> -
> ..
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Map.0.latency:
>  value={LatencySourceDescriptor{vertexID=1, subtaskIndex=-1}={p99=116.0, 
> p50=59.5, min=11.0, max=116.0, p95=116.0, mean=61.836}}
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Sink- Unnamed.0.latency: 
> value={LatencySourceDescriptor{vertexID=1, subtaskIndex=0}={p99=195.0, 
> p50=163.5, min=115.0, max=195.0, p95=195.0, mean=161.0}}
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8565) CheckpointOptionsTest unstable

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8565.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

master: 2479ff53cf33e8150c987e8368f112e747233ce9

> CheckpointOptionsTest unstable
> --
>
> Key: FLINK-8565
> URL: https://issues.apache.org/jira/browse/FLINK-8565
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing, Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.5.0
>
>
> https://travis-ci.org/zentol/flink/jobs/337945528
> {code}
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.239 sec <<< 
> FAILURE! - in org.apache.flink.runtime.checkpoint.CheckpointOptionsTest
> testSavepoint(org.apache.flink.runtime.checkpoint.CheckpointOptionsTest)  
> Time elapsed: 0.037 sec  <<< ERROR!
> java.lang.IllegalArgumentException: null
>   at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:123)
>   at 
> org.apache.flink.runtime.state.CheckpointStorageLocationReference.(CheckpointStorageLocationReference.java:60)
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointOptionsTest.testSavepoint(CheckpointOptionsTest.java:55)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7803) Update savepoint Documentation

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-7803:

Component/s: State Backends, Checkpointing

> Update savepoint Documentation
> --
>
> Key: FLINK-7803
> URL: https://issues.apache.org/jira/browse/FLINK-7803
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, State Backends, Checkpointing
>Reporter: Razvan
>Assignee: Razvan
>Priority: Major
>
> Can you please update 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html
>  to specify the savepoint location *MUST* always be a location accessible by 
> all hosts?
> I spent quite some time believing it'S a bug and trying to find solutions, 
> see https://issues.apache.org/jira/browse/FLINK-7750. It's not obvious in the 
> current documentation and other might waste time also believing it's an 
> actual issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7803) Update savepoint Documentation

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-7803:

Issue Type: Improvement  (was: Bug)

> Update savepoint Documentation
> --
>
> Key: FLINK-7803
> URL: https://issues.apache.org/jira/browse/FLINK-7803
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, State Backends, Checkpointing
>Reporter: Razvan
>Assignee: Razvan
>Priority: Major
>
> Can you please update 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html
>  to specify the savepoint location *MUST* always be a location accessible by 
> all hosts?
> I spent quite some time believing it'S a bug and trying to find solutions, 
> see https://issues.apache.org/jira/browse/FLINK-7750. It's not obvious in the 
> current documentation and other might waste time also believing it's an 
> actual issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5404: [FLINK-8550][table] Iterate over entryset instead ...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5404


---


[GitHub] flink pull request #5417: [FLINK-8565][tests] Ensure locationBytes.length > ...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5417


---


[jira] [Commented] (FLINK-8559) Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354552#comment-16354552
 ] 

ASF GitHub Bot commented on FLINK-8559:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5412


> Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to 
> get stuck
> -
>
> Key: FLINK-8559
> URL: https://issues.apache.org/jira/browse/FLINK-8559
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> In the {{RocksDBKeyedStatebackend#snapshotIncrementally}} we can find this 
> code
>  
> {code:java}
> final RocksDBIncrementalSnapshotOperation snapshotOperation =
>   new RocksDBIncrementalSnapshotOperation<>(
>   this,
>   checkpointStreamFactory,
>   checkpointId,
>   checkpointTimestamp);
> snapshotOperation.takeSnapshot();
> return new FutureTask(
>   new Callable() {
>   @Override
>   public KeyedStateHandle call() throws Exception {
>   return snapshotOperation.materializeSnapshot();
>   }
>   }
> ) {
>   @Override
>   public boolean cancel(boolean mayInterruptIfRunning) {
>   snapshotOperation.stop();
>   return super.cancel(mayInterruptIfRunning);
>   }
>   @Override
>   protected void done() {
>   snapshotOperation.releaseResources(isCancelled());
>   }
> };
> {code}
> In the constructor of RocksDBIncrementalSnapshotOperation we call 
> {{aquireResource()}} on the RocksDB {{ResourceGuard}}. If 
> {{snapshotOperation.takeSnapshot()}} fails with an exception these resources 
> are never released. When the task is shutdown due to the exception it will 
> get stuck on releasing RocksDB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8550) Iterate over entryset instead of keys

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354551#comment-16354551
 ] 

ASF GitHub Bot commented on FLINK-8550:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5404


> Iterate over entryset instead of keys
> -
>
> Key: FLINK-8550
> URL: https://issues.apache.org/jira/browse/FLINK-8550
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
>
> Iterate over entrysets instead of keys when we want to get both key and 
> value. 
> For example, in \{{ProcTimeBoundedRangeOver}}:
> {code:java}
> // code placeholder
> val iter = rowMapState.keys.iterator
> val markToRemove = new ArrayList[Long]()
> while (iter.hasNext) {
> val elementKey = iter.next
> if (elementKey < limit) {
> ...
> val elementsRemove = rowMapState.get(elementKey)
> ...
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5407: [hotfix][build] Fix duplicate maven enforce plugin...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5407


---


[GitHub] flink pull request #5406: [hotfix] Fix typos in comments.

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5406


---


[jira] [Commented] (FLINK-7608) LatencyGauge change to histogram metric

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354553#comment-16354553
 ] 

ASF GitHub Bot commented on FLINK-7608:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5161


> LatencyGauge change to  histogram metric
> 
>
> Key: FLINK-7608
> URL: https://issues.apache.org/jira/browse/FLINK-7608
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Hai Zhou UTC+8
>Assignee: Hai Zhou UTC+8
>Priority: Major
> Fix For: 1.5.0
>
>
> I used slf4jReporter[https://issues.apache.org/jira/browse/FLINK-4831]  to 
> export metrics the log file.
> I found:
> {noformat}
> -- Gauges 
> -
> ..
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Map.0.latency:
>  value={LatencySourceDescriptor{vertexID=1, subtaskIndex=-1}={p99=116.0, 
> p50=59.5, min=11.0, max=116.0, p95=116.0, mean=61.836}}
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Sink- Unnamed.0.latency: 
> value={LatencySourceDescriptor{vertexID=1, subtaskIndex=0}={p99=195.0, 
> p50=163.5, min=115.0, max=195.0, p95=195.0, mean=161.0}}
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8565) CheckpointOptionsTest unstable

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354554#comment-16354554
 ] 

ASF GitHub Bot commented on FLINK-8565:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5417


> CheckpointOptionsTest unstable
> --
>
> Key: FLINK-8565
> URL: https://issues.apache.org/jira/browse/FLINK-8565
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing, Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>
> https://travis-ci.org/zentol/flink/jobs/337945528
> {code}
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.239 sec <<< 
> FAILURE! - in org.apache.flink.runtime.checkpoint.CheckpointOptionsTest
> testSavepoint(org.apache.flink.runtime.checkpoint.CheckpointOptionsTest)  
> Time elapsed: 0.037 sec  <<< ERROR!
> java.lang.IllegalArgumentException: null
>   at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:123)
>   at 
> org.apache.flink.runtime.state.CheckpointStorageLocationReference.(CheckpointStorageLocationReference.java:60)
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointOptionsTest.testSavepoint(CheckpointOptionsTest.java:55)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5161: [FLINK-7608][metric] Refactor latency statistics m...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5161


---


[GitHub] flink pull request #5408: [hotfix][docs] Fix typos in windows document

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5408


---


[GitHub] flink pull request #5412: [FLINK-8559][RocksDB] Release resources if snapsho...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5412


---


[jira] [Commented] (FLINK-8275) Flink YARN deployment with Kerberos enabled not working

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354391#comment-16354391
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8275:


Merged.

1.5 - 97f0cac2af3a1140fa68090d94d83e009ad1e684
1.4 - 82f3957811d2e5bcefabaa42326d3d4476e45df0

> Flink YARN deployment with Kerberos enabled not working 
> 
>
> Key: FLINK-8275
> URL: https://issues.apache.org/jira/browse/FLINK-8275
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.4.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> The local keytab path in YarnTaskManagerRunner is incorrectly set to the 
> ApplicationMaster's local keytab path. This causes jobs to fail because the 
> TaskManager can't read the keytab.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-8275) Flink YARN deployment with Kerberos enabled not working

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-8275.

Resolution: Fixed

> Flink YARN deployment with Kerberos enabled not working 
> 
>
> Key: FLINK-8275
> URL: https://issues.apache.org/jira/browse/FLINK-8275
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.4.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> The local keytab path in YarnTaskManagerRunner is incorrectly set to the 
> ApplicationMaster's local keytab path. This causes jobs to fail because the 
> TaskManager can't read the keytab.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8421) HeapInternalTimerService should reconfigure compatible key / namespace serializers on restore

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354388#comment-16354388
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8421:


Merged.

1.5 - a79916b88ef3d97c11980c25d3887d43964c152d
1.4 - fb1e24e25eddaded9668041ba4f24e35384962c3

> HeapInternalTimerService should reconfigure compatible key / namespace 
> serializers on restore
> -
>
> Key: FLINK-8421
> URL: https://issues.apache.org/jira/browse/FLINK-8421
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The {{HeapInternalTimerService}} still uses simple {{equals}} checks on 
> restored / newly provided serializers for compatibility checks. This should 
> be replaced with the {{TypeSerializer::ensureCompatibility}} checks instead, 
> so that new serializers can be reconfigured.
> This would entail that the {{TypeSerializerConfiguration}} of the key and 
> namespace serializer in the {{HeapInternalTimerService}} also needs to be 
> written to the raw state.
> For Flink 1.4.0 release and current master, this is a critical bug since the 
> {{KryoSerializer}} has different default base registrations than before due 
> to FLINK-7420. i.e if the key of a window is serialized using the 
> {{KryoSerializer}} in 1.3.x, the restore would never succeed in 1.4.0.
> For 1.3.x, this fix would be an improvement, such that the 
> {{HeapInternalTimerService}} restore will make use of serializer 
> reconfiguration.
> Other remarks:
> * We need to double check all operators that checkpoint / restore from 
> **raw** state. Apparently, the serializer compatibility checks were only 
> implemented for managed state.
> * Migration ITCases apparently do not have enough coverage. A migration test 
> job that uses a key type which required the {{KryoSerializer}}, and uses 
> windows, would have caught this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8472) Extend migration tests for Flink 1.4

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-8472.
--

> Extend migration tests for Flink 1.4
> 
>
> Key: FLINK-8472
> URL: https://issues.apache.org/jira/browse/FLINK-8472
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> The following migration tests should be extended to cover migrating Flink 
> jobs with a 1.4 savepoint.
>  * {{WindowOperatorMigrationTest}}
>  * {{CEPMigrationTest}}
>  * {{StatefulJobSavepointMigrationTestITCase}}
>  * {{FlinkKinesisConsumerMigrationTest}}
>  * {{FlinkKafkaConsumerBaseMigrationTest}}
>  * {{ContinuousFileProcessingMigrationTest}}
>  * {{BucketingSinkMigrationTest}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8421) HeapInternalTimerService should reconfigure compatible key / namespace serializers on restore

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-8421.
--
Resolution: Fixed

> HeapInternalTimerService should reconfigure compatible key / namespace 
> serializers on restore
> -
>
> Key: FLINK-8421
> URL: https://issues.apache.org/jira/browse/FLINK-8421
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The {{HeapInternalTimerService}} still uses simple {{equals}} checks on 
> restored / newly provided serializers for compatibility checks. This should 
> be replaced with the {{TypeSerializer::ensureCompatibility}} checks instead, 
> so that new serializers can be reconfigured.
> This would entail that the {{TypeSerializerConfiguration}} of the key and 
> namespace serializer in the {{HeapInternalTimerService}} also needs to be 
> written to the raw state.
> For Flink 1.4.0 release and current master, this is a critical bug since the 
> {{KryoSerializer}} has different default base registrations than before due 
> to FLINK-7420. i.e if the key of a window is serialized using the 
> {{KryoSerializer}} in 1.3.x, the restore would never succeed in 1.4.0.
> For 1.3.x, this fix would be an improvement, such that the 
> {{HeapInternalTimerService}} restore will make use of serializer 
> reconfiguration.
> Other remarks:
> * We need to double check all operators that checkpoint / restore from 
> **raw** state. Apparently, the serializer compatibility checks were only 
> implemented for managed state.
> * Migration ITCases apparently do not have enough coverage. A migration test 
> job that uses a key type which required the {{KryoSerializer}}, and uses 
> windows, would have caught this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8472) Extend migration tests for Flink 1.4

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354386#comment-16354386
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8472:


Merged.

1.5 - 130ca4ec3895825022c2a05c0022684ab8c1a3bf
1.4 - f68fe3ee652820ea29653c17c7515bd466389ce9

> Extend migration tests for Flink 1.4
> 
>
> Key: FLINK-8472
> URL: https://issues.apache.org/jira/browse/FLINK-8472
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> The following migration tests should be extended to cover migrating Flink 
> jobs with a 1.4 savepoint.
>  * {{WindowOperatorMigrationTest}}
>  * {{CEPMigrationTest}}
>  * {{StatefulJobSavepointMigrationTestITCase}}
>  * {{FlinkKinesisConsumerMigrationTest}}
>  * {{FlinkKafkaConsumerBaseMigrationTest}}
>  * {{ContinuousFileProcessingMigrationTest}}
>  * {{BucketingSinkMigrationTest}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-8484) Kinesis consumer re-reads closed shards on job restart

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-8484.

Resolution: Fixed

> Kinesis consumer re-reads closed shards on job restart
> --
>
> Key: FLINK-8484
> URL: https://issues.apache.org/jira/browse/FLINK-8484
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Philip Luppens
>Assignee: Philip Luppens
>Priority: Blocker
>  Labels: bug, flink, kinesis
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> We’re using the connector to subscribe to streams varying from 1 to a 100 
> shards, and used the kinesis-scaling-utils to dynamically scale the Kinesis 
> stream up and down during peak times. What we’ve noticed is that, while we 
> were having closed shards, any Flink job restart with check- or save-point 
> would result in shards being re-read from the event horizon, duplicating our 
> events.
>  
> We started checking the checkpoint state, and found that the shards were 
> stored correctly with the proper sequence number (including for closed 
> shards), but that upon restarts, the older closed shards would be read from 
> the event horizon, as if their restored state would be ignored.
>  
> In the end, we believe that we found the problem: in the 
> FlinkKinesisConsumer’s run() method, we’re trying to find the shard returned 
> from the KinesisDataFetcher against the shards’ metadata from the restoration 
> point, but we do this via a containsKey() call, which means we’ll use the 
> StreamShardMetadata’s equals() method. However, this checks for all 
> properties, including the endingSequenceNumber, which might have changed 
> between the restored state’s checkpoint and our data fetch, thus failing the 
> equality check, failing the containsKey() check, and resulting in the shard 
> being re-read from the event horizon, even though it was present in the 
> restored state.
>  
> We’ve created a workaround where we only check for the shardId and stream 
> name to restore the state of the shards we’ve already seen, and this seems to 
> work correctly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8484) Kinesis consumer re-reads closed shards on job restart

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354385#comment-16354385
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8484:


Merged.

1.5 - 140bb0c66257d5d09d61845731aef1dbdb90a0bd
1.4 - 7884a4f749b8e7ba7b831963cf2a3be12df877b7

> Kinesis consumer re-reads closed shards on job restart
> --
>
> Key: FLINK-8484
> URL: https://issues.apache.org/jira/browse/FLINK-8484
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Philip Luppens
>Assignee: Philip Luppens
>Priority: Blocker
>  Labels: bug, flink, kinesis
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> We’re using the connector to subscribe to streams varying from 1 to a 100 
> shards, and used the kinesis-scaling-utils to dynamically scale the Kinesis 
> stream up and down during peak times. What we’ve noticed is that, while we 
> were having closed shards, any Flink job restart with check- or save-point 
> would result in shards being re-read from the event horizon, duplicating our 
> events.
>  
> We started checking the checkpoint state, and found that the shards were 
> stored correctly with the proper sequence number (including for closed 
> shards), but that upon restarts, the older closed shards would be read from 
> the event horizon, as if their restored state would be ignored.
>  
> In the end, we believe that we found the problem: in the 
> FlinkKinesisConsumer’s run() method, we’re trying to find the shard returned 
> from the KinesisDataFetcher against the shards’ metadata from the restoration 
> point, but we do this via a containsKey() call, which means we’ll use the 
> StreamShardMetadata’s equals() method. However, this checks for all 
> properties, including the endingSequenceNumber, which might have changed 
> between the restored state’s checkpoint and our data fetch, thus failing the 
> equality check, failing the containsKey() check, and resulting in the shard 
> being re-read from the event horizon, even though it was present in the 
> restored state.
>  
> We’ve created a workaround where we only check for the shardId and stream 
> name to restore the state of the shards we’ve already seen, and this seems to 
> work correctly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-8472) Extend migration tests for Flink 1.4

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-8472.

Resolution: Fixed

> Extend migration tests for Flink 1.4
> 
>
> Key: FLINK-8472
> URL: https://issues.apache.org/jira/browse/FLINK-8472
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> The following migration tests should be extended to cover migrating Flink 
> jobs with a 1.4 savepoint.
>  * {{WindowOperatorMigrationTest}}
>  * {{CEPMigrationTest}}
>  * {{StatefulJobSavepointMigrationTestITCase}}
>  * {{FlinkKinesisConsumerMigrationTest}}
>  * {{FlinkKafkaConsumerBaseMigrationTest}}
>  * {{ContinuousFileProcessingMigrationTest}}
>  * {{BucketingSinkMigrationTest}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8022) Kafka at-least-once tests fail occasionally

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354380#comment-16354380
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8022:


Tests disabled.

1.5 - 4197b0da100ab2da4b811847a401f17ce8bb1182

> Kafka at-least-once tests fail occasionally
> ---
>
> Key: FLINK-8022
> URL: https://issues.apache.org/jira/browse/FLINK-8022
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Nico Kruber
>Assignee: Piotr Nowojski
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> {{Kafka09ProducerITCase>KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink}}
>  seems to sporadically fail with missing data, like this execution:
> {code}
> 
> Test 
> testOneToOneAtLeastOnceRegularSink(org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase)
>  is running.
> 
> 17:54:30,195 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - 
> Creating topic oneToOneTopicRegularSink
> 17:54:30,196 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - In 
> getZKUtils:: zookeeperConnectionString = 127.0.0.1:39436
> 17:54:30,204 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Starting ZkClient event thread.
> 17:54:30,204 INFO  org.I0Itec.zkclient.ZkClient   
>- Waiting for keeper state SyncConnected
> 17:54:30,240 INFO  org.I0Itec.zkclient.ZkClient   
>- zookeeper state changed (SyncConnected)
> 17:54:30,261 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Terminate ZkClient event thread.
> 17:54:30,265 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - Topic 
> oneToOneTopicRegularSink create request is successfully posted
> 17:54:30,366 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - 
> Validating if the topic oneToOneTopicRegularSink has been created or not
> 17:54:30,373 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - In 
> getZKUtils:: zookeeperConnectionString = 127.0.0.1:39436
> 17:54:30,374 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Starting ZkClient event thread.
> 17:54:30,374 INFO  org.I0Itec.zkclient.ZkClient   
>- Waiting for keeper state SyncConnected
> 17:54:30,404 INFO  org.I0Itec.zkclient.ZkClient   
>- zookeeper state changed (SyncConnected)
> 17:54:30,420 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - topic 
> oneToOneTopicRegularSink has been created successfully
> 17:54:30,421 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Terminate ZkClient event thread.
> 17:54:31,099 INFO  
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - 
> Starting FlinkKafkaProducer (1/1) to produce into default topic 
> oneToOneTopicRegularSink
> 17:55:05,229 ERROR 
> org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase  - 
> 
> Test 
> testOneToOneAtLeastOnceRegularSink(org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase)
>  failed with:
> java.lang.AssertionError: Expected to contain all of: <[0, 1, 2, 3, 4, 5, 6, 
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
> 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 
> 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 
> 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 
> 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 
> 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 
> 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 
> 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 
> 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 
> 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 
> 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 
> 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 
> 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 
> 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 
> 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 

[jira] [Resolved] (FLINK-8022) Kafka at-least-once tests fail occasionally

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-8022.

Resolution: Resolved

> Kafka at-least-once tests fail occasionally
> ---
>
> Key: FLINK-8022
> URL: https://issues.apache.org/jira/browse/FLINK-8022
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Nico Kruber
>Assignee: Piotr Nowojski
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> {{Kafka09ProducerITCase>KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink}}
>  seems to sporadically fail with missing data, like this execution:
> {code}
> 
> Test 
> testOneToOneAtLeastOnceRegularSink(org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase)
>  is running.
> 
> 17:54:30,195 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - 
> Creating topic oneToOneTopicRegularSink
> 17:54:30,196 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - In 
> getZKUtils:: zookeeperConnectionString = 127.0.0.1:39436
> 17:54:30,204 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Starting ZkClient event thread.
> 17:54:30,204 INFO  org.I0Itec.zkclient.ZkClient   
>- Waiting for keeper state SyncConnected
> 17:54:30,240 INFO  org.I0Itec.zkclient.ZkClient   
>- zookeeper state changed (SyncConnected)
> 17:54:30,261 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Terminate ZkClient event thread.
> 17:54:30,265 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - Topic 
> oneToOneTopicRegularSink create request is successfully posted
> 17:54:30,366 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - 
> Validating if the topic oneToOneTopicRegularSink has been created or not
> 17:54:30,373 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - In 
> getZKUtils:: zookeeperConnectionString = 127.0.0.1:39436
> 17:54:30,374 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Starting ZkClient event thread.
> 17:54:30,374 INFO  org.I0Itec.zkclient.ZkClient   
>- Waiting for keeper state SyncConnected
> 17:54:30,404 INFO  org.I0Itec.zkclient.ZkClient   
>- zookeeper state changed (SyncConnected)
> 17:54:30,420 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - topic 
> oneToOneTopicRegularSink has been created successfully
> 17:54:30,421 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Terminate ZkClient event thread.
> 17:54:31,099 INFO  
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - 
> Starting FlinkKafkaProducer (1/1) to produce into default topic 
> oneToOneTopicRegularSink
> 17:55:05,229 ERROR 
> org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase  - 
> 
> Test 
> testOneToOneAtLeastOnceRegularSink(org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase)
>  failed with:
> java.lang.AssertionError: Expected to contain all of: <[0, 1, 2, 3, 4, 5, 6, 
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
> 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 
> 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 
> 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 
> 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 
> 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 
> 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 
> 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 
> 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 
> 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 
> 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 
> 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 
> 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 
> 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 
> 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 
> 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 
> 267, 268, 269, 

[jira] [Commented] (FLINK-8419) Kafka consumer's offset metrics are not registered for dynamically discovered partitions

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354377#comment-16354377
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8419:


Merged.

1.5 - 40f26c80476b884a6b7f3431562a78018fa014c5
1.4 - 432273aa963f46dc032c5007389a1a681b5188e6

> Kafka consumer's offset metrics are not registered for dynamically discovered 
> partitions
> 
>
> Key: FLINK-8419
> URL: https://issues.apache.org/jira/browse/FLINK-8419
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Metrics
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> Currently, the per-partition offset metrics are registered via the 
> {{AbstractFetcher#addOffsetStateGauge}} method. That method is only ever 
> called for the initial startup partitions, and not for dynamically discovered 
> partitions.
> We should consider adding some unit tests to make sure that metrics are 
> properly registered for all partitions. That would also safeguard us from 
> accidentally removing metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8419) Kafka consumer's offset metrics are not registered for dynamically discovered partitions

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-8419.
--
Resolution: Fixed

> Kafka consumer's offset metrics are not registered for dynamically discovered 
> partitions
> 
>
> Key: FLINK-8419
> URL: https://issues.apache.org/jira/browse/FLINK-8419
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Metrics
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> Currently, the per-partition offset metrics are registered via the 
> {{AbstractFetcher#addOffsetStateGauge}} method. That method is only ever 
> called for the initial startup partitions, and not for dynamically discovered 
> partitions.
> We should consider adding some unit tests to make sure that metrics are 
> properly registered for all partitions. That would also safeguard us from 
> accidentally removing metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8409) Race condition in KafkaConsumerThread leads to potential NPE

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-8409.
--
Resolution: Fixed

> Race condition in KafkaConsumerThread leads to potential NPE
> 
>
> Key: FLINK-8409
> URL: https://issues.apache.org/jira/browse/FLINK-8409
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.4.0, 1.3.2, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The following lines in the {{KafkaConsumerThread::setOffsetsToCommit(...)}} 
> suggests a race condition with the asynchronous callback from committing 
> offsets to Kafka:
> {code}
> // record the work to be committed by the main consumer thread and make sure 
> the consumer notices that
> if (nextOffsetsToCommit.getAndSet(offsetsToCommit) != null) {
> log.warn("Committing offsets to Kafka takes longer than the checkpoint 
> interval. " +
> "Skipping commit of previous offsets because newer complete 
> checkpoint offsets are available. " +
> "This does not compromise Flink's checkpoint integrity.");
> }
> this.offsetCommitCallback = commitCallback;
> {code}
> In the main consumer thread's main loop, {{nextOffsetsToCommit}} will be 
> checked if there are any offsets to commit. If so, an asynchronous offset 
> commit operation will be performed. The NPE happens in the case when the 
> commit completes, but {{this.offsetCommitCallback = commitCallback;}} is not 
> yet reached.
> A possible fix is to make setting the next offsets to commit along with the 
> callback instance a single atomic operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8409) Race condition in KafkaConsumerThread leads to potential NPE

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354373#comment-16354373
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8409:


Merged.

1.4 - 3655200799929409352945a3f4fce0f3e987b9ad
1.5 - 08163009e443d00379696f9f84b3ccb0af6a25b6

> Race condition in KafkaConsumerThread leads to potential NPE
> 
>
> Key: FLINK-8409
> URL: https://issues.apache.org/jira/browse/FLINK-8409
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.4.0, 1.3.2, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The following lines in the {{KafkaConsumerThread::setOffsetsToCommit(...)}} 
> suggests a race condition with the asynchronous callback from committing 
> offsets to Kafka:
> {code}
> // record the work to be committed by the main consumer thread and make sure 
> the consumer notices that
> if (nextOffsetsToCommit.getAndSet(offsetsToCommit) != null) {
> log.warn("Committing offsets to Kafka takes longer than the checkpoint 
> interval. " +
> "Skipping commit of previous offsets because newer complete 
> checkpoint offsets are available. " +
> "This does not compromise Flink's checkpoint integrity.");
> }
> this.offsetCommitCallback = commitCallback;
> {code}
> In the main consumer thread's main loop, {{nextOffsetsToCommit}} will be 
> checked if there are any offsets to commit. If so, an asynchronous offset 
> commit operation will be performed. The NPE happens in the case when the 
> commit completes, but {{this.offsetCommitCallback = commitCallback;}} is not 
> yet reached.
> A possible fix is to make setting the next offsets to commit along with the 
> callback instance a single atomic operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-8409) Race condition in KafkaConsumerThread leads to potential NPE

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354373#comment-16354373
 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-8409 at 2/6/18 7:07 PM:


Merged.

1.5 - 3655200799929409352945a3f4fce0f3e987b9ad
 1.4 - 08163009e443d00379696f9f84b3ccb0af6a25b6


was (Author: tzulitai):
Merged.

1.4 - 3655200799929409352945a3f4fce0f3e987b9ad
1.5 - 08163009e443d00379696f9f84b3ccb0af6a25b6

> Race condition in KafkaConsumerThread leads to potential NPE
> 
>
> Key: FLINK-8409
> URL: https://issues.apache.org/jira/browse/FLINK-8409
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.4.0, 1.3.2, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The following lines in the {{KafkaConsumerThread::setOffsetsToCommit(...)}} 
> suggests a race condition with the asynchronous callback from committing 
> offsets to Kafka:
> {code}
> // record the work to be committed by the main consumer thread and make sure 
> the consumer notices that
> if (nextOffsetsToCommit.getAndSet(offsetsToCommit) != null) {
> log.warn("Committing offsets to Kafka takes longer than the checkpoint 
> interval. " +
> "Skipping commit of previous offsets because newer complete 
> checkpoint offsets are available. " +
> "This does not compromise Flink's checkpoint integrity.");
> }
> this.offsetCommitCallback = commitCallback;
> {code}
> In the main consumer thread's main loop, {{nextOffsetsToCommit}} will be 
> checked if there are any offsets to commit. If so, an asynchronous offset 
> commit operation will be performed. The NPE happens in the case when the 
> commit completes, but {{this.offsetCommitCallback = commitCallback;}} is not 
> yet reached.
> A possible fix is to make setting the next offsets to commit along with the 
> callback instance a single atomic operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8398) Stabilize flaky KinesisDataFetcherTests

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-8398.
--
Resolution: Fixed

> Stabilize flaky KinesisDataFetcherTests
> ---
>
> Key: FLINK-8398
> URL: https://issues.apache.org/jira/browse/FLINK-8398
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>
> The unit tests in {{KinesisDataFetcherTest}} have very flaky implementations. 
> They rely on on thread sleeps to wait for a certain operation to happen, 
> which can easily miss and cause tests to fail.
> Although there isn't any reports of consistent failures on these tests yet 
> (as far as I am aware of),  they can easily surface in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6004) Allow FlinkKinesisConsumer to skip corrupted messages

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354371#comment-16354371
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6004:


Merged.

1.5 - 3062e29a8bbb667d098c43a0b225d5602049

> Allow FlinkKinesisConsumer to skip corrupted messages
> -
>
> Key: FLINK-6004
> URL: https://issues.apache.org/jira/browse/FLINK-6004
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: 1.5.0
>
>
> It is quite clear from the fix of FLINK-3679 that in reality, users might 
> encounter corrupted messages from Kafka / Kinesis / generally external 
> sources when deserializing them.
> The consumers should support simply skipping those messages, by letting the 
> deserialization schema return {{null}}, and checking {{null}} values within 
> the consumer.
> This has been done for the Kafka consumer already. This ticket tracks the 
> improvement for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-6004) Allow FlinkKinesisConsumer to skip corrupted messages

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-6004:
---
Fix Version/s: 1.5.0

> Allow FlinkKinesisConsumer to skip corrupted messages
> -
>
> Key: FLINK-6004
> URL: https://issues.apache.org/jira/browse/FLINK-6004
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: 1.5.0
>
>
> It is quite clear from the fix of FLINK-3679 that in reality, users might 
> encounter corrupted messages from Kafka / Kinesis / generally external 
> sources when deserializing them.
> The consumers should support simply skipping those messages, by letting the 
> deserialization schema return {{null}}, and checking {{null}} values within 
> the consumer.
> This has been done for the Kafka consumer already. This ticket tracks the 
> improvement for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-6004) Allow FlinkKinesisConsumer to skip corrupted messages

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-6004.
--
Resolution: Fixed

> Allow FlinkKinesisConsumer to skip corrupted messages
> -
>
> Key: FLINK-6004
> URL: https://issues.apache.org/jira/browse/FLINK-6004
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: 1.5.0
>
>
> It is quite clear from the fix of FLINK-3679 that in reality, users might 
> encounter corrupted messages from Kafka / Kinesis / generally external 
> sources when deserializing them.
> The consumers should support simply skipping those messages, by letting the 
> deserialization schema return {{null}}, and checking {{null}} values within 
> the consumer.
> This has been done for the Kafka consumer already. This ticket tracks the 
> improvement for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8398) Stabilize flaky KinesisDataFetcherTests

2018-02-06 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354370#comment-16354370
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-8398:


Merged.

1.4 - ca1e525c5c2ddb2b36fcdd7fe8d2e7b053a063cf
1.5 - 3f0d6c618302e4b341e5a442126e6ba2889cd2f4

> Stabilize flaky KinesisDataFetcherTests
> ---
>
> Key: FLINK-8398
> URL: https://issues.apache.org/jira/browse/FLINK-8398
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>
> The unit tests in {{KinesisDataFetcherTest}} have very flaky implementations. 
> They rely on on thread sleeps to wait for a certain operation to happen, 
> which can easily miss and cause tests to fail.
> Although there isn't any reports of consistent failures on these tests yet 
> (as far as I am aware of),  they can easily surface in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8484) Kinesis consumer re-reads closed shards on job restart

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354360#comment-16354360
 ] 

ASF GitHub Bot commented on FLINK-8484:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5337


> Kinesis consumer re-reads closed shards on job restart
> --
>
> Key: FLINK-8484
> URL: https://issues.apache.org/jira/browse/FLINK-8484
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Philip Luppens
>Assignee: Philip Luppens
>Priority: Blocker
>  Labels: bug, flink, kinesis
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> We’re using the connector to subscribe to streams varying from 1 to a 100 
> shards, and used the kinesis-scaling-utils to dynamically scale the Kinesis 
> stream up and down during peak times. What we’ve noticed is that, while we 
> were having closed shards, any Flink job restart with check- or save-point 
> would result in shards being re-read from the event horizon, duplicating our 
> events.
>  
> We started checking the checkpoint state, and found that the shards were 
> stored correctly with the proper sequence number (including for closed 
> shards), but that upon restarts, the older closed shards would be read from 
> the event horizon, as if their restored state would be ignored.
>  
> In the end, we believe that we found the problem: in the 
> FlinkKinesisConsumer’s run() method, we’re trying to find the shard returned 
> from the KinesisDataFetcher against the shards’ metadata from the restoration 
> point, but we do this via a containsKey() call, which means we’ll use the 
> StreamShardMetadata’s equals() method. However, this checks for all 
> properties, including the endingSequenceNumber, which might have changed 
> between the restored state’s checkpoint and our data fetch, thus failing the 
> equality check, failing the containsKey() check, and resulting in the shard 
> being re-read from the event horizon, even though it was present in the 
> restored state.
>  
> We’ve created a workaround where we only check for the shardId and stream 
> name to restore the state of the shards we’ve already seen, and this seems to 
> work correctly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8398) Stabilize flaky KinesisDataFetcherTests

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354355#comment-16354355
 ] 

ASF GitHub Bot commented on FLINK-8398:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5268


> Stabilize flaky KinesisDataFetcherTests
> ---
>
> Key: FLINK-8398
> URL: https://issues.apache.org/jira/browse/FLINK-8398
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>
> The unit tests in {{KinesisDataFetcherTest}} have very flaky implementations. 
> They rely on on thread sleeps to wait for a certain operation to happen, 
> which can easily miss and cause tests to fail.
> Although there isn't any reports of consistent failures on these tests yet 
> (as far as I am aware of),  they can easily surface in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8421) HeapInternalTimerService should reconfigure compatible key / namespace serializers on restore

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354363#comment-16354363
 ] 

ASF GitHub Bot commented on FLINK-8421:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5362


> HeapInternalTimerService should reconfigure compatible key / namespace 
> serializers on restore
> -
>
> Key: FLINK-8421
> URL: https://issues.apache.org/jira/browse/FLINK-8421
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The {{HeapInternalTimerService}} still uses simple {{equals}} checks on 
> restored / newly provided serializers for compatibility checks. This should 
> be replaced with the {{TypeSerializer::ensureCompatibility}} checks instead, 
> so that new serializers can be reconfigured.
> This would entail that the {{TypeSerializerConfiguration}} of the key and 
> namespace serializer in the {{HeapInternalTimerService}} also needs to be 
> written to the raw state.
> For Flink 1.4.0 release and current master, this is a critical bug since the 
> {{KryoSerializer}} has different default base registrations than before due 
> to FLINK-7420. i.e if the key of a window is serialized using the 
> {{KryoSerializer}} in 1.3.x, the restore would never succeed in 1.4.0.
> For 1.3.x, this fix would be an improvement, such that the 
> {{HeapInternalTimerService}} restore will make use of serializer 
> reconfiguration.
> Other remarks:
> * We need to double check all operators that checkpoint / restore from 
> **raw** state. Apparently, the serializer compatibility checks were only 
> implemented for managed state.
> * Migration ITCases apparently do not have enough coverage. A migration test 
> job that uses a key type which required the {{KryoSerializer}}, and uses 
> windows, would have caught this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8275) Flink YARN deployment with Kerberos enabled not working

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354362#comment-16354362
 ] 

ASF GitHub Bot commented on FLINK-8275:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5172


> Flink YARN deployment with Kerberos enabled not working 
> 
>
> Key: FLINK-8275
> URL: https://issues.apache.org/jira/browse/FLINK-8275
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.4.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> The local keytab path in YarnTaskManagerRunner is incorrectly set to the 
> ApplicationMaster's local keytab path. This causes jobs to fail because the 
> TaskManager can't read the keytab.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8472) Extend migration tests for Flink 1.4

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354361#comment-16354361
 ] 

ASF GitHub Bot commented on FLINK-8472:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5364


> Extend migration tests for Flink 1.4
> 
>
> Key: FLINK-8472
> URL: https://issues.apache.org/jira/browse/FLINK-8472
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> The following migration tests should be extended to cover migrating Flink 
> jobs with a 1.4 savepoint.
>  * {{WindowOperatorMigrationTest}}
>  * {{CEPMigrationTest}}
>  * {{StatefulJobSavepointMigrationTestITCase}}
>  * {{FlinkKinesisConsumerMigrationTest}}
>  * {{FlinkKafkaConsumerBaseMigrationTest}}
>  * {{ContinuousFileProcessingMigrationTest}}
>  * {{BucketingSinkMigrationTest}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8022) Kafka at-least-once tests fail occasionally

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354359#comment-16354359
 ] 

ASF GitHub Bot commented on FLINK-8022:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5316


> Kafka at-least-once tests fail occasionally
> ---
>
> Key: FLINK-8022
> URL: https://issues.apache.org/jira/browse/FLINK-8022
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Nico Kruber
>Assignee: Piotr Nowojski
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> {{Kafka09ProducerITCase>KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink}}
>  seems to sporadically fail with missing data, like this execution:
> {code}
> 
> Test 
> testOneToOneAtLeastOnceRegularSink(org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase)
>  is running.
> 
> 17:54:30,195 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - 
> Creating topic oneToOneTopicRegularSink
> 17:54:30,196 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - In 
> getZKUtils:: zookeeperConnectionString = 127.0.0.1:39436
> 17:54:30,204 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Starting ZkClient event thread.
> 17:54:30,204 INFO  org.I0Itec.zkclient.ZkClient   
>- Waiting for keeper state SyncConnected
> 17:54:30,240 INFO  org.I0Itec.zkclient.ZkClient   
>- zookeeper state changed (SyncConnected)
> 17:54:30,261 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Terminate ZkClient event thread.
> 17:54:30,265 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - Topic 
> oneToOneTopicRegularSink create request is successfully posted
> 17:54:30,366 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - 
> Validating if the topic oneToOneTopicRegularSink has been created or not
> 17:54:30,373 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - In 
> getZKUtils:: zookeeperConnectionString = 127.0.0.1:39436
> 17:54:30,374 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Starting ZkClient event thread.
> 17:54:30,374 INFO  org.I0Itec.zkclient.ZkClient   
>- Waiting for keeper state SyncConnected
> 17:54:30,404 INFO  org.I0Itec.zkclient.ZkClient   
>- zookeeper state changed (SyncConnected)
> 17:54:30,420 INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl  - topic 
> oneToOneTopicRegularSink has been created successfully
> 17:54:30,421 INFO  org.I0Itec.zkclient.ZkEventThread  
>- Terminate ZkClient event thread.
> 17:54:31,099 INFO  
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - 
> Starting FlinkKafkaProducer (1/1) to produce into default topic 
> oneToOneTopicRegularSink
> 17:55:05,229 ERROR 
> org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase  - 
> 
> Test 
> testOneToOneAtLeastOnceRegularSink(org.apache.flink.streaming.connectors.kafka.Kafka09ProducerITCase)
>  failed with:
> java.lang.AssertionError: Expected to contain all of: <[0, 1, 2, 3, 4, 5, 6, 
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
> 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 
> 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 
> 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 
> 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 
> 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 
> 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 
> 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 
> 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 
> 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 
> 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 
> 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 
> 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 
> 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 
> 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 

[jira] [Commented] (FLINK-8419) Kafka consumer's offset metrics are not registered for dynamically discovered partitions

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354358#comment-16354358
 ] 

ASF GitHub Bot commented on FLINK-8419:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5335


> Kafka consumer's offset metrics are not registered for dynamically discovered 
> partitions
> 
>
> Key: FLINK-8419
> URL: https://issues.apache.org/jira/browse/FLINK-8419
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Metrics
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> Currently, the per-partition offset metrics are registered via the 
> {{AbstractFetcher#addOffsetStateGauge}} method. That method is only ever 
> called for the initial startup partitions, and not for dynamically discovered 
> partitions.
> We should consider adding some unit tests to make sure that metrics are 
> properly registered for all partitions. That would also safeguard us from 
> accidentally removing metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8409) Race condition in KafkaConsumerThread leads to potential NPE

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354356#comment-16354356
 ] 

ASF GitHub Bot commented on FLINK-8409:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5329


> Race condition in KafkaConsumerThread leads to potential NPE
> 
>
> Key: FLINK-8409
> URL: https://issues.apache.org/jira/browse/FLINK-8409
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.4.0, 1.3.2, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The following lines in the {{KafkaConsumerThread::setOffsetsToCommit(...)}} 
> suggests a race condition with the asynchronous callback from committing 
> offsets to Kafka:
> {code}
> // record the work to be committed by the main consumer thread and make sure 
> the consumer notices that
> if (nextOffsetsToCommit.getAndSet(offsetsToCommit) != null) {
> log.warn("Committing offsets to Kafka takes longer than the checkpoint 
> interval. " +
> "Skipping commit of previous offsets because newer complete 
> checkpoint offsets are available. " +
> "This does not compromise Flink's checkpoint integrity.");
> }
> this.offsetCommitCallback = commitCallback;
> {code}
> In the main consumer thread's main loop, {{nextOffsetsToCommit}} will be 
> checked if there are any offsets to commit. If so, an asynchronous offset 
> commit operation will be performed. The NPE happens in the case when the 
> commit completes, but {{this.offsetCommitCallback = commitCallback;}} is not 
> yet reached.
> A possible fix is to make setting the next offsets to commit along with the 
> callback instance a single atomic operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6004) Allow FlinkKinesisConsumer to skip corrupted messages

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354357#comment-16354357
 ] 

ASF GitHub Bot commented on FLINK-6004:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5269


> Allow FlinkKinesisConsumer to skip corrupted messages
> -
>
> Key: FLINK-6004
> URL: https://issues.apache.org/jira/browse/FLINK-6004
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
>
> It is quite clear from the fix of FLINK-3679 that in reality, users might 
> encounter corrupted messages from Kafka / Kinesis / generally external 
> sources when deserializing them.
> The consumers should support simply skipping those messages, by letting the 
> deserialization schema return {{null}}, and checking {{null}} values within 
> the consumer.
> This has been done for the Kafka consumer already. This ticket tracks the 
> improvement for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5364: [FLINK-8472] [test] Extend all migration tests for...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5364


---


[GitHub] flink pull request #5362: [FLINK-8421] [DataStream] Make timer serializers r...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5362


---


[GitHub] flink pull request #5337: [FLINK-8484][flink-kinesis-connector] Ensure a Kin...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5337


---


[GitHub] flink pull request #5269: [FLINK-6004] [kinesis] Allow FlinkKinesisConsumer ...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5269


---


[GitHub] flink pull request #5335: (master) [FLINK-8419] [kafka] Register metrics for...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5335


---


[GitHub] flink pull request #5380: [hotfix][connectors] Fix log format strings

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5380


---


[GitHub] flink pull request #5172: [FLINK-8275] [Security] fix keytab local path in Y...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5172


---


[GitHub] flink pull request #5316: [FLINK-8022][kafka-tests] Disable at-least-once te...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5316


---


[GitHub] flink pull request #5268: [FLINK-8398] [kinesis, tests] Harden KinesisDataFe...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5268


---


[GitHub] flink pull request #5329: [FLINK-8409] [kafka] Fix potential NPE in KafkaCon...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5329


---


[jira] [Updated] (FLINK-8570) Provide a hook in Flink KafkaConsumer(source function) implementation to override assignment of kafka partitions to individual task nodes

2018-02-06 Thread Nagarjun Guraja (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nagarjun Guraja updated FLINK-8570:
---
Description: This hook can be leveraged in coassigning and managing the 
association between KeyGroups and Kafka Partitions. This is required to exploit 
the use cases where streams are pre-partitioned on Kafka layer

> Provide a hook in Flink KafkaConsumer(source function) implementation to 
> override assignment of kafka partitions to individual task nodes
> -
>
> Key: FLINK-8570
> URL: https://issues.apache.org/jira/browse/FLINK-8570
> Project: Flink
>  Issue Type: Improvement
>Reporter: Nagarjun Guraja
>Priority: Major
>
> This hook can be leveraged in coassigning and managing the association 
> between KeyGroups and Kafka Partitions. This is required to exploit the use 
> cases where streams are pre-partitioned on Kafka layer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8571) Provide an enhanced KeyedStream implementation to use ForwardPartitioner

2018-02-06 Thread Nagarjun Guraja (JIRA)
Nagarjun Guraja created FLINK-8571:
--

 Summary: Provide an enhanced KeyedStream implementation to use 
ForwardPartitioner
 Key: FLINK-8571
 URL: https://issues.apache.org/jira/browse/FLINK-8571
 Project: Flink
  Issue Type: Improvement
Reporter: Nagarjun Guraja


This enhancement would help in modeling problems with pre partitioned input 
sources(for e.g. Kafka with Keyed topics). This would help in making the job 
graph embarrassingly parallel while leveraging rocksdb state backend and also 
the fine grained recovery semantics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8570) Provide a hook in Flink KafkaConsumer(source function) implementation to override assignment of kafka partitions to individual task nodes

2018-02-06 Thread Nagarjun Guraja (JIRA)
Nagarjun Guraja created FLINK-8570:
--

 Summary: Provide a hook in Flink KafkaConsumer(source function) 
implementation to override assignment of kafka partitions to individual task 
nodes
 Key: FLINK-8570
 URL: https://issues.apache.org/jira/browse/FLINK-8570
 Project: Flink
  Issue Type: Improvement
Reporter: Nagarjun Guraja






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8569) Provide a hook to override the default KeyGroupRangeAssignment

2018-02-06 Thread Nagarjun Guraja (JIRA)
Nagarjun Guraja created FLINK-8569:
--

 Summary: Provide a hook to override the default 
KeyGroupRangeAssignment
 Key: FLINK-8569
 URL: https://issues.apache.org/jira/browse/FLINK-8569
 Project: Flink
  Issue Type: Improvement
Reporter: Nagarjun Guraja


Currently the class 'org.apache.flink.runtime.state.KeyGroupRangeAssignment' 
has static methods(not pluggable) which is a little prohibitive and unintuitive 
to onboard some of the keyed embarrassingly parallel jobs. It would be helpful 
if a hook per job is provided to make this pluggable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8101) Elasticsearch 6.x support

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354203#comment-16354203
 ] 

ASF GitHub Bot commented on FLINK-8101:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/5374
  
@cjolif sorry, I didn't have the chance, yet.

But I agree that we should try getting this in for 1.5 (I saw your reply on 
the 1.5 release discussion thread in the mailing lists). I'll try to get back 
to this as soon as possible.


> Elasticsearch 6.x support
> -
>
> Key: FLINK-8101
> URL: https://issues.apache.org/jira/browse/FLINK-8101
> Project: Flink
>  Issue Type: New Feature
>  Components: ElasticSearch Connector
>Affects Versions: 1.4.0
>Reporter: Hai Zhou UTC+8
>Assignee: Flavio Pompermaier
>Priority: Major
> Fix For: 1.5.0
>
>
> Recently, elasticsearch 6.0.0 was released: 
> https://www.elastic.co/blog/elasticsearch-6-0-0-released  
> The minimum version of ES6 compatible Elasticsearch Java Client is 5.6.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5374: [FLINK-8101][flink-connectors] Elasticsearch 5.3+ (Transp...

2018-02-06 Thread tzulitai
Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/5374
  
@cjolif sorry, I didn't have the chance, yet.

But I agree that we should try getting this in for 1.5 (I saw your reply on 
the 1.5 release discussion thread in the mailing lists). I'll try to get back 
to this as soon as possible.


---


[jira] [Comment Edited] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.

2018-02-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354179#comment-16354179
 ] 

Aljoscha Krettek edited comment on FLINK-7756 at 2/6/18 5:25 PM:
-

You can use {{1.5-SNAPSHOT}} as the version and add our snapshots repository, 
as described here: 
http://flink.apache.org/contribute-code.html#snapshots-nightly-builds

There is also a link to download a nightly flink distribution, which is 
1.5-SNAPSHOT, currently.


was (Author: aljoscha):
You can use {{1.5-SNAPSHOT}} as the version and add our snapshots repository, 
as described here: 
http://flink.apache.org/contribute-code.html#snapshots-nightly-builds

There is also a link to download a nightly flink distribution, which is 
1.5-SNAPSHOT, currently.

> RocksDB state backend Checkpointing (Async and Incremental)  is not working 
> with CEP.
> -
>
> Key: FLINK-7756
> URL: https://issues.apache.org/jira/browse/FLINK-7756
> Project: Flink
>  Issue Type: Sub-task
>  Components: CEP, State Backends, Checkpointing, Streaming
>Affects Versions: 1.4.0, 1.3.2
> Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend
>Reporter: Shashank Agarwal
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> When i try to use RocksDBStateBackend on my staging cluster (which is using 
> HDFS as file system) it crashes. But When i use FsStateBackend on staging 
> (which is using HDFS as file system) it is working fine.
> On local with local file system it's working fine in both cases.
> Please check attached logs. I have around 20-25 tasks in my app.
> {code:java}
> 2017-09-29 14:21:31,639 INFO  
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink  - No state 
> to restore for the BucketingSink (taskIdx=0).
> 2017-09-29 14:21:31,640 INFO  
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - 
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,020 INFO  
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink  - No state 
> to restore for the BucketingSink (taskIdx=1).
> 2017-09-29 14:21:32,022 INFO  
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - 
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,078 INFO  com.datastax.driver.core.NettyUtil  
>   - Found Netty's native epoll transport in the classpath, using 
> it
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Co-Flat Map (1/2) 
> (b879f192c4e8aae6671cdafb3a24c00a).
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Map (2/2) 
> (1ea5aef6ccc7031edc6b37da2912d90b).
> 2017-09-29 14:21:34,178 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Co-Flat Map (2/2) 
> (4bac8e764c67520d418a4c755be23d4d).
> 2017-09-29 14:21:34,178 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched 
> from RUNNING to FAILED.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Co-Flat Map (1/2).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Co-Flat Map (1/2).
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>   ... 5 more
>   Suppressed: java.lang.Exception: Could not properly cancel managed 
> keyed state future.
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023)
>   at 
> org.a

[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.

2018-02-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354183#comment-16354183
 ] 

Aljoscha Krettek commented on FLINK-6321:
-

You can use {{1.5-SNAPSHOT}} as the version and add our snapshots repository, 
as described here: 
http://flink.apache.org/contribute-code.html#snapshots-nightly-builds

There is also a link to download a nightly flink distribution, which is 
1.5-SNAPSHOT, currently.

> RocksDB state backend Checkpointing is not working with KeyedCEP.
> -
>
> Key: FLINK-6321
> URL: https://issues.apache.org/jira/browse/FLINK-6321
> Project: Flink
>  Issue Type: Sub-task
>  Components: CEP
>Affects Versions: 1.2.0
> Environment: yarn-cluster, RocksDB State backend, Checkpointing every 
> 1000 ms
>Reporter: Shashank Agarwal
>Assignee: Kostas Kloudas
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> Checkpointing is not working with RocksDBStateBackend when using CEP. It's 
> working fine with FsStateBackend and MemoryStateBackend. Application failing 
> every-time.
> {code}
> 04/18/2017 21:53:20   Job execution switched to status FAILING.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 
> 46 for operator KeyedCEPPatternOperator -> Map (1/4).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 46 for 
> operator KeyedCEPPatternOperator -> Map (1/4).
>   ... 6 more
> Caused by: java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915)
>   ... 5 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.

2018-02-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354179#comment-16354179
 ] 

Aljoscha Krettek commented on FLINK-7756:
-

You can use {{1.5-SNAPSHOT}} as the version and add our snapshots repository, 
as described here: 
http://flink.apache.org/contribute-code.html#snapshots-nightly-builds

There is also a link to download a nightly flink distribution, which is 
1.5-SNAPSHOT, currently.

> RocksDB state backend Checkpointing (Async and Incremental)  is not working 
> with CEP.
> -
>
> Key: FLINK-7756
> URL: https://issues.apache.org/jira/browse/FLINK-7756
> Project: Flink
>  Issue Type: Sub-task
>  Components: CEP, State Backends, Checkpointing, Streaming
>Affects Versions: 1.4.0, 1.3.2
> Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend
>Reporter: Shashank Agarwal
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> When i try to use RocksDBStateBackend on my staging cluster (which is using 
> HDFS as file system) it crashes. But When i use FsStateBackend on staging 
> (which is using HDFS as file system) it is working fine.
> On local with local file system it's working fine in both cases.
> Please check attached logs. I have around 20-25 tasks in my app.
> {code:java}
> 2017-09-29 14:21:31,639 INFO  
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink  - No state 
> to restore for the BucketingSink (taskIdx=0).
> 2017-09-29 14:21:31,640 INFO  
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - 
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,020 INFO  
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink  - No state 
> to restore for the BucketingSink (taskIdx=1).
> 2017-09-29 14:21:32,022 INFO  
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - 
> Initializing RocksDB keyed state backend from snapshot.
> 2017-09-29 14:21:32,078 INFO  com.datastax.driver.core.NettyUtil  
>   - Found Netty's native epoll transport in the classpath, using 
> it
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Co-Flat Map (1/2) 
> (b879f192c4e8aae6671cdafb3a24c00a).
> 2017-09-29 14:21:34,177 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Map (2/2) 
> (1ea5aef6ccc7031edc6b37da2912d90b).
> 2017-09-29 14:21:34,178 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally Co-Flat Map (2/2) 
> (4bac8e764c67520d418a4c755be23d4d).
> 2017-09-29 14:21:34,178 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched 
> from RUNNING to FAILED.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 
> for operator Co-Flat Map (1/2).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2 for 
> operator Co-Flat Map (1/2).
>   ... 6 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897)
>   ... 5 more
>   Suppressed: java.lang.Exception: Could not properly cancel managed 
> keyed state future.
>   at 
> org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961)
>   ... 5 more
>   Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.Futu

[jira] [Closed] (FLINK-8568) flink table's guava classes can't beed relocated by maven-shed-plugin.

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8568.
---
Resolution: Duplicate

> flink table's guava classes can't beed relocated by maven-shed-plugin.
> --
>
> Key: FLINK-8568
> URL: https://issues.apache.org/jira/browse/FLINK-8568
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table Jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
> Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8568) flink table's guava classes can't beed relocated by maven-shed-plugin.

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-8568:

Affects Version/s: (was: 2.0.0)

> flink table's guava classes can't beed relocated by maven-shed-plugin.
> --
>
> Key: FLINK-8568
> URL: https://issues.apache.org/jira/browse/FLINK-8568
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table Jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
> Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8567) Maven-shade-plugin can't relocate Scala classes.

2018-02-06 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354159#comment-16354159
 ] 

Chesnay Schepler commented on FLINK-8567:
-

Please see the instructions at 
https://ci.apache.org/projects/flink/flink-docs-master/start/building.html#dependency-shading
 and see if they resolve your issue.

> Maven-shade-plugin can't relocate Scala classes.
> 
>
> Key: FLINK-8567
> URL: https://issues.apache.org/jira/browse/FLINK-8567
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
>  Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8567) Maven-shade-plugin can't relocate Scala classes.

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-8567:

Affects Version/s: (was: 2.0.0)

> Maven-shade-plugin can't relocate Scala classes.
> 
>
> Key: FLINK-8567
> URL: https://issues.apache.org/jira/browse/FLINK-8567
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
>  Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8568) flink table's guava classes can't beed relocated by maven-shed-plugin.

2018-02-06 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-8568:

Fix Version/s: (was: 2.0.0)

> flink table's guava classes can't beed relocated by maven-shed-plugin.
> --
>
> Key: FLINK-8568
> URL: https://issues.apache.org/jira/browse/FLINK-8568
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.5.0
>Reporter: John Fang
>Priority: Major
> Attachments: exclude_guava.patch
>
>
> Maven-shade-plugin only relocate java classes, but not Scala classes. I try 
> to decompile flink-table Jar, and find those guava path not beed relocated on 
> those Scala classes. But flink job run normally for all time. Because 
> flink-dist_${scala.binary.version}.jar contains the google guava and flink 
> shaded guava at the same time. Why is the google guava still contained int 
> the flink?
> Because we build flink by Maven-assembly-plugin at flink-disk module. The 
> Assembly Plugin aggregate its all dependencies, includes those dependencies 
> which are shaded now. If we use the following patch which exclude the Guava, 
> then the flink-dist_${scala.binary.version}.jar only contains the 
> flink-shaded-guava, not the google guava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6321) RocksDB state backend Checkpointing is not working with KeyedCEP.

2018-02-06 Thread Shashank Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354152#comment-16354152
 ] 

Shashank Agarwal commented on FLINK-6321:
-

Sure I’ll check with 1.5 snapshot, only thing is during test on cluster I
have to resolve the dependencies can you give me a clue or link how I can
test that on cluster with dependencies. I  using SBT.  Is their any script
in tools which I can use to publish the dependencies.


On Tue, 6 Feb 2018 at 10:24 PM, Aljoscha Krettek (JIRA) 



> RocksDB state backend Checkpointing is not working with KeyedCEP.
> -
>
> Key: FLINK-6321
> URL: https://issues.apache.org/jira/browse/FLINK-6321
> Project: Flink
>  Issue Type: Sub-task
>  Components: CEP
>Affects Versions: 1.2.0
> Environment: yarn-cluster, RocksDB State backend, Checkpointing every 
> 1000 ms
>Reporter: Shashank Agarwal
>Assignee: Kostas Kloudas
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> Checkpointing is not working with RocksDBStateBackend when using CEP. It's 
> working fine with FsStateBackend and MemoryStateBackend. Application failing 
> every-time.
> {code}
> 04/18/2017 21:53:20   Job execution switched to status FAILING.
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 
> 46 for operator KeyedCEPPatternOperator -> Map (1/4).}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Could not materialize checkpoint 46 for 
> operator KeyedCEPPatternOperator -> Map (1/4).
>   ... 6 more
> Caused by: java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915)
>   ... 5 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8501) Use single BlobCacheService per TaskExecutor

2018-02-06 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-8501.

Resolution: Fixed

Fixed via 22bf3861c95ed93b2eb5f8d0661d282ef6cdbbc8

> Use single BlobCacheService per TaskExecutor
> 
>
> Key: FLINK-8501
> URL: https://issues.apache.org/jira/browse/FLINK-8501
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> Currently, the {{TaskExecutor}} creates a new {{BlobCacheService}} for each 
> new {{JobManagerConnection}}. This is wasteful and, moreover, gives only 
> access to the {{BlobService}} after a connection to a {{JobMaster}} has been 
> established. Due to this, it is not possible to upload the {{TaskExecutor}} 
> logs before a {{TaskExecutor}} is used by a {{JobMaster}}.
>  
> Since the {{BlobServer}} address is something cluster specific and not 
> {{JobMaster}} specific, I propose to propagate this information when the 
> {{TaskExecutor}} registers at the {{ResourceManager}}. Moreover, I propose to 
> make the {{BlobCacheService}} reusable in case of a {{BlobServer}} address 
> change (e.g. failover) by allowing to change the {{BlobServer}} address.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8503) Port TaskManagerLogHandler to new REST endpoint

2018-02-06 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-8503.

Resolution: Fixed

Fixed via 5f573804f0f486db7cc594ae678adff5ae7f217c

> Port TaskManagerLogHandler to new REST endpoint
> ---
>
> Key: FLINK-8503
> URL: https://issues.apache.org/jira/browse/FLINK-8503
> Project: Flink
>  Issue Type: Sub-task
>  Components: REST
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> In order to serve {{TaskExecutor}} log stdout files, we have to port the 
> {{TaskManagerLogHandler}} to the new REST endpoint.
> In order to properly support serving of files, I propose to introduce an 
> {{AbstractHandler}} which takes a typed request but has not typed response. 
> That way we can easily output the file contents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8502) Remove LibraryCacheManager from JobMaster

2018-02-06 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-8502.

Resolution: Fixed

Fixed via 2af2b7347c7036bae4b1fb5f37782eccc6749911

> Remove LibraryCacheManager from JobMaster
> -
>
> Key: FLINK-8502
> URL: https://issues.apache.org/jira/browse/FLINK-8502
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Minor
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> The {{JobMaster}} does not need access to the {{LibraryCacheManager}} because 
> it is already started with the user code class loader. We should, therefore, 
> remove the unused components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-7856) Port JobVertexBackPressureHandler to REST endpoint

2018-02-06 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-7856.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Fixed via 

7137d619456be401cb8a7b867daa98eb80f3838c

c1280a5056196e2c20e50dc6c3631ca93c877e58

> Port JobVertexBackPressureHandler to REST endpoint
> --
>
> Key: FLINK-7856
> URL: https://issues.apache.org/jira/browse/FLINK-7856
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination, REST, Webfrontend
>Reporter: Fang Yong
>Assignee: Gary Yao
>Priority: Major
> Fix For: 1.5.0
>
>
> Port JobVertexBackPressureHandler to REST endpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >