[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers

2019-04-28 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828949#comment-16828949
 ] 

Till Rohrmann commented on FLINK-12342:
---

Before diving into the implementation, I would first like to fully understand 
the problem. Concretely a Yarn reference would be good which explains the 
behaviour. Otherwise we might simply fix a symptom or not the problem at all.

> Yarn Resource Manager Acquires Too Many Containers
> --
>
> Key: FLINK-12342
> URL: https://issues.apache.org/jira/browse/FLINK-12342
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Environment: We runs job in Flink release 1.6.3. 
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In currently implementation of YarnFlinkResourceManager, it starts to acquire 
> new container one by one when get request from SlotManager. The mechanism 
> works when job is still, say less than 32 containers. If the job has 256 
> container, containers can't be immediately allocated and appending requests 
> in AMRMClient will be not removed accordingly. We observe the situation that 
> AMRMClient ask for current pending request + 1 (the new request from slot 
> manager) containers. In this way, during the start time of such job, it asked 
> for 4000+ containers. If there is an external dependency issue happens, for 
> example hdfs access is slow. Then, the whole job will be blocked without 
> getting enough resource and finally killed with SlotManager request timeout.
> Thus, we should use the total number of container asked rather than pending 
> request in AMRMClient as threshold to make decision whether we need to add 
> one more resource request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] beyond1920 commented on issue #8294: [FLINK-12348][table-planner-blink]Use TableConfig in api module to replace TableConfig in blink-planner module.

2019-04-28 Thread GitBox
beyond1920 commented on issue #8294: [FLINK-12348][table-planner-blink]Use 
TableConfig in api module to replace TableConfig in blink-planner module.
URL: https://github.com/apache/flink/pull/8294#issuecomment-487457504
 
 
   @KurtYoung , after discussed with @hequn8128 offline,  final TableConfig has 
two level configuration (BTW, there is a unused field `userDefinedConfig` in 
current `TableConfig`, @hequn8128 will delete it soon):
   * common configuration among multiple planners.
   * configuration for  each specified planner, which implements interface 
`PlannerConfig`. For example, introduce `PlannerConfigImpl` in Blink planner 
module for Blink planner.
   
   Each planner may have different ConfigOptions, which could exist in API 
package of each planner module when multiple planner exist at same time.
   After current planner is deprecated, only one planner is retained, we could 
move it's `PlannerConfig` implementation and it's `ConfigOptions` to API module.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module

2019-04-28 Thread GitBox
bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database 
related operations in GenericHiveMetastoreCatalog and setup 
flink-connector-hive module
URL: https://github.com/apache/flink/pull/8205#issuecomment-487447400
 
 
   @flinkbot attention @zentol 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8308: [FLINK-12361][Runtime/Operators]remove useless expression from runtim…

2019-04-28 Thread GitBox
flinkbot commented on issue #8308: [FLINK-12361][Runtime/Operators]remove 
useless expression from runtim…
URL: https://github.com/apache/flink/pull/8308#issuecomment-487442905
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] WeiZhong94 commented on issue #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-28 Thread GitBox
WeiZhong94 commented on issue #8267: [FLINK-12311][table][python] Add base 
python framework and Add Scan, Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#issuecomment-487442930
 
 
   @sunjincheng121  thanks for your review! I have rebased this PR and 
addressed your comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12361) Remove useless expression from runtime scheduler

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12361:
---
Labels: pull-request-available  (was: )

> Remove useless expression from runtime scheduler
> 
>
> Key: FLINK-12361
> URL: https://issues.apache.org/jira/browse/FLINK-12361
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Operators
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2019-04-29-11-16-13-492.png
>
>
> In the scheduleTask method of Scheduler class, expression 
> forceExternalLocation is useless, since it always evaluates to false:
>  !image-2019-04-29-11-16-13-492.png! 
> So it can be removed. Moreover, by removing this expression, the code 
> structure can be made much simpler, because there are some branches relying 
> this expression, which can also be removed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] liyafan82 opened a new pull request #8308: [FLINK-12361][Runtime/Operators]remove useless expression from runtim…

2019-04-28 Thread GitBox
liyafan82 opened a new pull request #8308: 
[FLINK-12361][Runtime/Operators]remove useless expression from runtim…
URL: https://github.com/apache/flink/pull/8308
 
 
   …e scheduler
   
   
   
   ## What is the purpose of the change
   
   Remove useless expression forceExternalLocation from method 
Scheduler#scheduleTask
   
   ## Brief change log
   
 - Remove this expression from the method.
 - Remove the condititional branches dependent on this expression, since 
this expression always evaluates to false.
   
   
   ## Verifying this change
   
   Verifyed by manually run the tests in flink-runtime.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-28 Thread GitBox
WeiZhong94 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279230381
 
 

 ##
 File path: 
flink-python/flink-python-table/src/main/python/pyflink/table/__init__.py
 ##
 @@ -0,0 +1,37 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from pyflink.table.table import Table
+from pyflink.table.table_config import TableConfig
+from pyflink.table.table_environment import TableEnvironment, 
StreamTableEnvironment, BatchTableEnvironment
+from pyflink.table.table_sink import TableSink, CsvTableSink
+from pyflink.table.table_source import TableSource, CsvTableSource
+from pyflink.table.types import DataTypes
+
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-28 Thread GitBox
WeiZhong94 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279230371
 
 

 ##
 File path: flink-python/flink-python-table/src/main/python/setup.py
 ##
 @@ -0,0 +1,32 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+from distutils.core import setup
+
+setup(
 
 Review comment:
   Fixed. Now the setup.py will check whether the python version is above 2.7.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-28 Thread GitBox
WeiZhong94 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279230215
 
 

 ##
 File path: flink-python/flink-python-table/src/main/python/setup.py
 ##
 @@ -0,0 +1,32 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+from distutils.core import setup
+
+setup(
+name='pyflink',
+version='1.0',
 
 Review comment:
   Yes, it makes sense to me. Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-28 Thread GitBox
WeiZhong94 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279230115
 
 

 ##
 File path: flink-python/flink-python-table/src/main/python/setup.py
 ##
 @@ -0,0 +1,32 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+from distutils.core import setup
+
+setup(
+name='pyflink',
+version='1.0',
+packages=['pyflink',
+  'pyflink.table',
+  'pyflink.util'],
+url='http://flink.apache.org',
+license='Licensed under the Apache License, Version 2.0',
+author='Flink Developers',
+author_email='d...@flink.apache.org',
+install_requires=['py4j==0.10.8.1'],
+description='Flink Python API'
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support

2019-04-28 Thread GitBox
WeiZhong94 commented on a change in pull request #8267: 
[FLINK-12311][table][python] Add base python framework and Add Scan, 
Projection, and Filter operator support
URL: https://github.com/apache/flink/pull/8267#discussion_r279230110
 
 

 ##
 File path: docs/dev/table/tableApi.md
 ##
 @@ -297,6 +297,81 @@ val result = orders.where('b === "red")
 
   
 
+
 
 Review comment:
   I agree that a README.md file is necessary. I added it in the new commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-12361) Remove useless expression from runtime scheduler

2019-04-28 Thread Liya Fan (JIRA)
Liya Fan created FLINK-12361:


 Summary: Remove useless expression from runtime scheduler
 Key: FLINK-12361
 URL: https://issues.apache.org/jira/browse/FLINK-12361
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Operators
Reporter: Liya Fan
Assignee: Liya Fan
 Attachments: image-2019-04-29-11-16-13-492.png

In the scheduleTask method of Scheduler class, expression forceExternalLocation 
is useless, since it always evaluates to false:

 !image-2019-04-29-11-16-13-492.png! 

So it can be removed. Moreover, by removing this expression, the code structure 
can be made much simpler, because there are some branches relying this 
expression, which can also be removed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] lamber-ken commented on a change in pull request #8268: [FLINK-12246][runtime] Fix can't get max_attempts_history_size value when create ExecutionVertex

2019-04-28 Thread GitBox
lamber-ken commented on a change in pull request #8268: [FLINK-12246][runtime] 
Fix can't get max_attempts_history_size value when create ExecutionVertex
URL: https://github.com/apache/flink/pull/8268#discussion_r279229762
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
 ##
 @@ -816,6 +817,9 @@ public void attachJobGraph(List 
topologiallySorted) throws JobExcepti
 
final ArrayList newExecJobVertices = new 
ArrayList<>(topologiallySorted.size());
final long createTimestamp = System.currentTimeMillis();
+   int maxPriorAttemptsHistoryLength = 
jobInformation.getJobConfiguration().getInteger(
 
 Review comment:
   @tillrohrmann, hi, I have question that `ExecutionGraphBuilder.buildGraph` 
was called from `JobMaster`, but there is only one `jobMasterConfiguration` in 
JobMaster class. Can jobMasterConfiguration represent ClusterConfigration? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12360) Translate "Jobs and Scheduling" Page into Chinese

2019-04-28 Thread Armstrong Nova (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armstrong Nova updated FLINK-12360:
---
Description: 
Translate the internal page 
"[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]";
 to Chinese 

Translate "flink/docs/internals/job_scheduling.md"  into 
"flink/docs/internals/job_scheduling.zh.md"

 

  was:Translate the internal page 
"[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]";
 to Chinese 

Summary: Translate "Jobs and Scheduling" Page into Chinese  (was: 
Translate "Jobs and Scheduling" Page to Chinese)

> Translate "Jobs and Scheduling" Page into Chinese
> -
>
> Key: FLINK-12360
> URL: https://issues.apache.org/jira/browse/FLINK-12360
> Project: Flink
>  Issue Type: Task
>  Components: chinese-translation, Documentation
>Reporter: Armstrong Nova
>Assignee: Armstrong Nova
>Priority: Major
>
> Translate the internal page 
> "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]";
>  to Chinese 
> Translate "flink/docs/internals/job_scheduling.md"  into 
> "flink/docs/internals/job_scheduling.zh.md"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] wuchong commented on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime

2019-04-28 Thread GitBox
wuchong commented on issue #8302: [FLINK-12269][table-blink] Support Temporal 
Table Join in blink planner and runtime
URL: https://github.com/apache/flink/pull/8302#issuecomment-487433247
 
 
   Hi @godfreyhe , I add the `JoinPushExpressionsRule.INSTANCE` rule into 
predicate pushdown stage to make sure the calculation expression can be pushed 
down (to make sure `ON a+1 = b` is an equi join).  That's why several results 
of plan test are changed. I have checked, the changes are reasonable.
   
   It would be nice if you can have a look too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] wuchong edited a comment on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime

2019-04-28 Thread GitBox
wuchong edited a comment on issue #8302: [FLINK-12269][table-blink] Support 
Temporal Table Join in blink planner and runtime
URL: https://github.com/apache/flink/pull/8302#issuecomment-487433247
 
 
   Hi @godfreyhe , I add the `JoinPushExpressionsRule.INSTANCE` rule into 
predicate pushdown stage to make sure the calculation expression can be pushed 
down (to make sure `ON mod(a, 4) = b` is an equi join).  That's why several 
results of plan test are changed. I have checked, the changes are reasonable.
   
   It would be nice if you can have a look too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-12360) Translate "Jobs and Scheduling" Page to Chinese

2019-04-28 Thread Armstrong Nova (JIRA)
Armstrong Nova created FLINK-12360:
--

 Summary: Translate "Jobs and Scheduling" Page to Chinese
 Key: FLINK-12360
 URL: https://issues.apache.org/jira/browse/FLINK-12360
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Reporter: Armstrong Nova
Assignee: Armstrong Nova


Translate the internal page 
"[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]";
 to Chinese 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12360) Translate "Jobs and Scheduling" Page into Chinese

2019-04-28 Thread Armstrong Nova (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armstrong Nova updated FLINK-12360:
---
Description: 
Translate the internal page 
"[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]";
 to Chinese 

the doc locates in  "flink/docs/internals/job_scheduling.md", the translated 
doc in "flink/docs/internals/job_scheduling.zh.md"

 

  was:
Translate the internal page 
"[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]";
 to Chinese 

Translate "flink/docs/internals/job_scheduling.md"  into 
"flink/docs/internals/job_scheduling.zh.md"

 


> Translate "Jobs and Scheduling" Page into Chinese
> -
>
> Key: FLINK-12360
> URL: https://issues.apache.org/jira/browse/FLINK-12360
> Project: Flink
>  Issue Type: Task
>  Components: chinese-translation, Documentation
>Reporter: Armstrong Nova
>Assignee: Armstrong Nova
>Priority: Major
>
> Translate the internal page 
> "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]";
>  to Chinese 
> the doc locates in  "flink/docs/internals/job_scheduling.md", the translated 
> doc in "flink/docs/internals/job_scheduling.zh.md"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] sunjincheng121 commented on a change in pull request #8230: [FLINK-10977][table] Add streaming non-window FlatAggregate to Table API

2019-04-28 Thread GitBox
sunjincheng121 commented on a change in pull request #8230: 
[FLINK-10977][table] Add streaming non-window FlatAggregate to Table API
URL: https://github.com/apache/flink/pull/8230#discussion_r279223257
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/expressions/AggregateFunctionDefinition.java
 ##
 @@ -31,13 +31,13 @@
 @PublicEvolving
 public final class AggregateFunctionDefinition extends FunctionDefinition {
 
-   private final AggregateFunction aggregateFunction;
+   private final UserDefinedAggregateFunction aggregateFunction;
private final TypeInformation resultTypeInfo;
private final TypeInformation accumulatorTypeInfo;
 
public AggregateFunctionDefinition(
String name,
-   AggregateFunction aggregateFunction,
+   UserDefinedAggregateFunction aggregateFunction,
 
 Review comment:
   Yes, make sense to me if we can share more logic between Agg and Table Agg. 
and we do not need to add a new type.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12351) AsyncWaitOperator should deep copy StreamElement when object reuse is enabled

2019-04-28 Thread Jark Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828869#comment-16828869
 ] 

Jark Wu commented on FLINK-12351:
-

Hi [~aitozi], I think fix the bug in AsyncWaitOperator and enable objectReuse 
on operator level are two orthogonal problems. We can create another JIRA to 
discuss the operator level object reuse problem.

Currently, I only find the AsyncWaitOperator is affected, because it doesn't 
deep copy input record before put it into heap buffer (Java ArrayDeque).

IMO, no matter object reuse is enabled or not, the AsyncWaitOperator should 
output the same result, because it's the framework code not user code.

Hi [~till.rohrmann], what do you think about this? If you don't object, I can 
create a PR for this.

> AsyncWaitOperator should deep copy StreamElement when object reuse is enabled
> -
>
> Key: FLINK-12351
> URL: https://issues.apache.org/jira/browse/FLINK-12351
> Project: Flink
>  Issue Type: Bug
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.9.0
>
>
> Currently, AsyncWaitOperator directly put the input StreamElement into 
> {{StreamElementQueue}}. But when object reuse is enabled, the StreamElement 
> is reused, which means the element in {{StreamElementQueue}} will be 
> modified. As a result, the output of AsyncWaitOperator might be wrong.
> An easy way to fix this might be deep copy the input StreamElement when 
> object reuse is enabled, like this: 
> https://github.com/apache/flink/blob/blink/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/async/AsyncWaitOperator.java#L209



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot edited a comment on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module

2019-04-28 Thread GitBox
flinkbot edited a comment on issue #8205: [FLINK-12238] [hive] Support database 
related operations in GenericHiveMetastoreCatalog and setup 
flink-connector-hive module
URL: https://github.com/apache/flink/pull/8205#issuecomment-484369983
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❗ 3. Needs [attention] from.
   - Needs attention by @kurtyoung, @zentol [PMC]
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module

2019-04-28 Thread GitBox
bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database 
related operations in GenericHiveMetastoreCatalog and setup 
flink-connector-hive module
URL: https://github.com/apache/flink/pull/8205#issuecomment-487431946
 
 
   Synced with @zentol offline. We cannot depend on flink-shaded-hadoop2 
because it only publishes SNAPSHOT jars for hadoop 2.4, but not 2.6 and above. 
Thus we decided to depend on hadoop directly with "provided" scope.
   
   @KurtYoung would be great to have you take a final look and merge if 
everything looks good


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module

2019-04-28 Thread GitBox
bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database 
related operations in GenericHiveMetastoreCatalog and setup 
flink-connector-hive module
URL: https://github.com/apache/flink/pull/8205#issuecomment-487431970
 
 
   @flinkbot attention @KurtYoung 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request

2019-04-28 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828843#comment-16828843
 ] 

Chesnay Schepler edited comment on FLINK-12358 at 4/28/19 9:01 PM:
---

I doubt that a diff-based approach will work here properly. For one there's a 
maintenance overhead, since every new relevant file (i.e. header) must be 
covered, but their location is arbitrary. You'd also have to check whether 
changes in the file actually change semantics; for example a javadoc change 
doesn't require regeneration. Finally, you wouldn't be able to determine 
whether changes to the REST API docs are in sync with other changes made.

Given that we already have utilities for reading/writing Rest API snapshots 
(see RestAPIStabilityTest) and examples for parsing the existing html (see 
ConfigOptionsCompletenessTest), I'd be very much in favor of following these 
example..


was (Author: zentol):
I doubt that a diff-based approach will work here properly. For one there's a 
maintenance overhead, since every new relevant file (i.e. header) must be 
covered, but their location is arbitrary. You'd also have to check whether 
changes in the file actually change semantics; for example a javadoc change 
doesn't require regeneration.

Given that we already have utilities for reading/writing Rest API snapshots 
(see RestAPIStabilityTest) and examples for parsing the existing html (see 
ConfigOptionsCompletenessTest), I'd be very much in favor of a similar approach.

> Verify whether rest documenation needs to be updated when building pull 
> request
> ---
>
> Key: FLINK-12358
> URL: https://issues.apache.org/jira/browse/FLINK-12358
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Minor
>
> Currently, unlike configuration docs, rest-API docs have no any methods to 
> check whether updated to latest code. This is really annoying and not easy to 
> track if only checked by developers.
> I plan to check this in travis to verify whether any files have been updated 
> by using `git status`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request

2019-04-28 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828843#comment-16828843
 ] 

Chesnay Schepler commented on FLINK-12358:
--

I doubt that a diff-based approach will work here properly. For one there's a 
maintenance overhead, since every new relevant file (i.e. header) must be 
covered, but their location is arbitrary. You'd also have to check whether 
changes in the file actually change semantics; for example a javadoc change 
doesn't require regeneration.

Given that we already have utilities for reading/writing Rest API snapshots 
(see RestAPIStabilityTest) and examples for parsing the existing html (see 
ConfigOptionsCompletenessTest), I'd be very much in favor of a similar approach.

> Verify whether rest documenation needs to be updated when building pull 
> request
> ---
>
> Key: FLINK-12358
> URL: https://issues.apache.org/jira/browse/FLINK-12358
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Minor
>
> Currently, unlike configuration docs, rest-API docs have no any methods to 
> check whether updated to latest code. This is really annoying and not easy to 
> track if only checked by developers.
> I plan to check this in travis to verify whether any files have been updated 
> by using `git status`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12175) TypeExtractor.getMapReturnTypes produces different TypeInformation than createTypeInformation for classes with parameterized ancestors

2019-04-28 Thread Andrey Bulgakov (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828822#comment-16828822
 ] 

Andrey Bulgakov commented on FLINK-12175:
-

Yes I am working on this. I have found a solution and working on pull request.

> TypeExtractor.getMapReturnTypes produces different TypeInformation than 
> createTypeInformation for classes with parameterized ancestors
> --
>
> Key: FLINK-12175
> URL: https://issues.apache.org/jira/browse/FLINK-12175
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.7.2, 1.9.0
>Reporter: Dylan Adams
>Priority: Major
>
> I expect that the {{TypeMapper}} {{createTypeInformation}} and 
> {{getMapReturnTypes}} would produce equivalent type information for the same 
> type. But when there is a parameterized superclass, this does not appear to 
> be the case.
> Here's a test case that could be added to {{PojoTypeExtractorTest.java}} that 
> demonstrates the issue:
> {code}
> public static class Pojo implements Serializable {
>   public int digits;
>   public String letters;
> }
> public static class ParameterizedParent implements 
> Serializable {
>   public T pojoField;
> }
> public static class ConcreteImpl extends ParameterizedParent {
>   public double precise;
> }
> public static class ConcreteMapFunction implements MapFunction ConcreteImpl> {
>   @Override
>   public ConcreteImpl map(ConcreteImpl value) throws Exception {
>   return null;
>   }
> }
> @Test
> public void testMapReturnType() {
>   final TypeInformation directTypeInfo = 
> TypeExtractor.createTypeInfo(ConcreteImpl.class);
>   Assert.assertTrue(directTypeInfo instanceof PojoTypeInfo);
>   TypeInformation directPojoFieldTypeInfo = ((PojoTypeInfo) 
> directTypeInfo).getPojoFieldAt(0).getTypeInformation();
>   Assert.assertTrue(directPojoFieldTypeInfo instanceof PojoTypeInfo);
>   final TypeInformation mapReturnTypeInfo
>   = TypeExtractor.getMapReturnTypes(new ConcreteMapFunction(), 
> directTypeInfo);
>   Assert.assertTrue(mapReturnTypeInfo instanceof PojoTypeInfo);
>   TypeInformation mapReturnPojoFieldTypeInfo = ((PojoTypeInfo) 
> mapReturnTypeInfo).getPojoFieldAt(0).getTypeInformation();
>   Assert.assertTrue(mapReturnPojoFieldTypeInfo instanceof PojoTypeInfo);
>   Assert.assertEquals(directTypeInfo, mapReturnTypeInfo);
> }
> {code}
> This test case will fail on the last two asserts because 
> {{getMapReturnTypes}} produces a {{TypeInformation}} for {{ConcreteImpl}} 
> with a {{GenericTypeInfo}} for the {{pojoField}}, whereas 
> {{createTypeInformation}} correctly produces a {{PojoTypeInfo}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread lamber-ken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828776#comment-16828776
 ] 

lamber-ken commented on FLINK-12183:


(y)

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will 
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. 
> It's hard to find that because here it only catches IOException. In 
> SubtaskExecutionAttemptDetailsHandler and  
> SubtaskExecutionAttemptAccumulatorsHandler, when calling 
> archiveJsonWithPath() method, it will construct some json information about 
> prior execution attempts but the index is from 0 which might be dropped index 
> for the for loop.  In default, it will return null when trying to get the 
> prior execution attempt (AccessExecution attempt = 
> subtask.getPriorExecutionAttempt(x)).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8307: [FLINK-12359][metrics][tests] Harden SystemResourcesMetricsITCase

2019-04-28 Thread GitBox
flinkbot commented on issue #8307: [FLINK-12359][metrics][tests] Harden 
SystemResourcesMetricsITCase
URL: https://github.com/apache/flink/pull/8307#issuecomment-487392775
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12359) SystemResourcesMetricsITCase unstable

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12359:
---
Labels: pull-request-available  (was: )

> SystemResourcesMetricsITCase unstable
> -
>
> Key: FLINK-12359
> URL: https://issues.apache.org/jira/browse/FLINK-12359
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics, Tests
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> The {{SystemResourcesMetricsITCase}} checks that task managers register 
> specific set of metrics if configured to do so. The test assumes that the TM 
> is already started completely when the test starts, but this may not be the 
> case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol opened a new pull request #8307: [FLINK-12359][metrics][tests] Harden SystemResourcesMetricsITCase

2019-04-28 Thread GitBox
zentol opened a new pull request #8307: [FLINK-12359][metrics][tests] Harden 
SystemResourcesMetricsITCase
URL: https://github.com/apache/flink/pull/8307
 
 
   ## What is the purpose of the change
   
   Hardens the `SystemResourcesMetricsITCase` against
   * slow starts of a TaskManager, in which case metrics could be registered 
later than the test expects
   * the host portion of the metric identifier not being `localhost`
   
   We now exclude the host from the identifier, and generate a future in the 
reporter that we wait on instead.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (FLINK-12139) Flink on mesos - Parameterize disk space needed.

2019-04-28 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-12139:
-

Assignee: Juan  (was: Oleksandr Nitavskyi)

> Flink on mesos - Parameterize disk space needed.
> 
>
> Key: FLINK-12139
> URL: https://issues.apache.org/jira/browse/FLINK-12139
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Mesos
>Reporter: Juan
>Assignee: Juan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We are having a small issue while trying to deploy Flink on Mesos using 
> marathon. In our set up of Mesos we are required to specify the amount of 
> disk space we want to have for the applications we deploy there.
> The current default value in Flink is 0 and it's currently is not 
> parameterizable. This means that we ask 0 disk space for our instances so 
> Flink can't work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-12139) Flink on mesos - Parameterize disk space needed.

2019-04-28 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-12139.
---
   Resolution: Fixed
Fix Version/s: 1.9.0

Fixed via cf10e70088645b0fcc54ab03ecec3ef372ddb652

> Flink on mesos - Parameterize disk space needed.
> 
>
> Key: FLINK-12139
> URL: https://issues.apache.org/jira/browse/FLINK-12139
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Mesos
>Reporter: Juan
>Assignee: Juan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We are having a small issue while trying to deploy Flink on Mesos using 
> marathon. In our set up of Mesos we are required to specify the amount of 
> disk space we want to have for the applications we deploy there.
> The current default value in Flink is 0 and it's currently is not 
> parameterizable. This means that we ask 0 disk space for our instances so 
> Flink can't work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] tillrohrmann closed pull request #8224: [FLINK-12139][Mesos] Add disk space parameter.

2019-04-28 Thread GitBox
tillrohrmann closed pull request #8224: [FLINK-12139][Mesos] Add disk space 
parameter.
URL: https://github.com/apache/flink/pull/8224
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12342:
---
Labels: pull-request-available  (was: )

> Yarn Resource Manager Acquires Too Many Containers
> --
>
> Key: FLINK-12342
> URL: https://issues.apache.org/jira/browse/FLINK-12342
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Environment: We runs job in Flink release 1.6.3. 
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Critical
>  Labels: pull-request-available
>
> In currently implementation of YarnFlinkResourceManager, it starts to acquire 
> new container one by one when get request from SlotManager. The mechanism 
> works when job is still, say less than 32 containers. If the job has 256 
> container, containers can't be immediately allocated and appending requests 
> in AMRMClient will be not removed accordingly. We observe the situation that 
> AMRMClient ask for current pending request + 1 (the new request from slot 
> manager) containers. In this way, during the start time of such job, it asked 
> for 4000+ containers. If there is an external dependency issue happens, for 
> example hdfs access is slow. Then, the whole job will be blocked without 
> getting enough resource and finally killed with SlotManager request timeout.
> Thus, we should use the total number of container asked rather than pending 
> request in AMRMClient as threshold to make decision whether we need to add 
> one more resource request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] HuangZhenQiu opened a new pull request #8306: [FLINK-12342] Use total requested pending containers to bound the maximum containers to…

2019-04-28 Thread GitBox
HuangZhenQiu opened a new pull request #8306: [FLINK-12342] Use total requested 
pending containers to bound the maximum containers to…
URL: https://github.com/apache/flink/pull/8306
 
 
   ## What is the purpose of the change
   
   Fix the issue of requesting too many containers issue in 
FlinkResourceManager. Original the new request issue is bounded by the number 
of pending containers in AMRMClientAsync. As every time AMRMClientAsync will 
issue request of pendingContainers  + 1 containers, it always ask more than we 
need. Especially when an Flink application needs 100+ containers, the number 
total containers will be in thousands. The issue will be enlarged when we do 
cluster maintenance and restart 1000+ applications as soon as possible. Too 
many request will overload yarn resource manager and introduce extra delay for 
restarting applications.
   
   ## Brief change log
   
 - record the total number of containers are requested and still in pending 
status
 - use the totalRequestedPendingContainers as upper bound fore new 
container acquirement
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8306: [FLINK-12342] Use total requested pending containers to bound the maximum containers to…

2019-04-28 Thread GitBox
flinkbot commented on issue #8306: [FLINK-12342] Use total requested pending 
containers to bound the maximum containers to…
URL: https://github.com/apache/flink/pull/8306#issuecomment-487389322
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations

2019-04-28 Thread Gyula Fora (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828040#comment-16828040
 ] 

Gyula Fora commented on FLINK-12141:


[~yangfei] I havent started any work on this, so if you are interested please 
go ahead :) 

> Allow @TypeInfo annotation on POJO field declarations
> -
>
> Key: FLINK-12141
> URL: https://issues.apache.org/jira/browse/FLINK-12141
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Type Serialization System
>Reporter: Gyula Fora
>Priority: Minor
>  Labels: starter
>
> The TypeInfo annotation is a great way to declare serializers for custom 
> types however I feel that it's usage is limited by the fact that it can only 
> be used on types that are declared in the project.
> By allowing the annotation to be used on field declarations we could improve 
> the TypeExtractor logic to use the type factory when creating the 
> PojoTypeInformation.
> This would be a big improvement as in many cases classes from other libraries 
> or collection types are used within custom Pojo classes and Flink would 
> default to Kryo serialization which would hurt performance and cause problems 
> later.
> The current workaround in these cases is to implement a custom serializer for 
> the entire pojo which is a waste of effort when only a few fields might 
> require custom serialization logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations

2019-04-28 Thread Gyula Fora (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-12141:
--

Assignee: YangFei

> Allow @TypeInfo annotation on POJO field declarations
> -
>
> Key: FLINK-12141
> URL: https://issues.apache.org/jira/browse/FLINK-12141
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Type Serialization System
>Reporter: Gyula Fora
>Assignee: YangFei
>Priority: Minor
>  Labels: starter
>
> The TypeInfo annotation is a great way to declare serializers for custom 
> types however I feel that it's usage is limited by the fact that it can only 
> be used on types that are declared in the project.
> By allowing the annotation to be used on field declarations we could improve 
> the TypeExtractor logic to use the type factory when creating the 
> PojoTypeInformation.
> This would be a big improvement as in many cases classes from other libraries 
> or collection types are used within custom Pojo classes and Flink would 
> default to Kryo serialization which would hurt performance and cause problems 
> later.
> The current workaround in these cases is to implement a custom serializer for 
> the entire pojo which is a waste of effort when only a few fields might 
> require custom serialization logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8305: [hotfix] regenerate rest-docs to latest code

2019-04-28 Thread GitBox
flinkbot commented on issue #8305: [hotfix] regenerate rest-docs to latest code
URL: https://github.com/apache/flink/pull/8305#issuecomment-487389173
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-12359) SystemResourcesMetricsITCase unstable

2019-04-28 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-12359:


 Summary: SystemResourcesMetricsITCase unstable
 Key: FLINK-12359
 URL: https://issues.apache.org/jira/browse/FLINK-12359
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics, Tests
Affects Versions: 1.9.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.9.0


The {{SystemResourcesMetricsITCase}} checks that task managers register 
specific set of metrics if configured to do so. The test assumes that the TM is 
already started completely when the test starts, but this may not be the case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] Myasuka opened a new pull request #8305: [hotfix] regenerate rest-docs to latest code

2019-04-28 Thread GitBox
Myasuka opened a new pull request #8305: [hotfix] regenerate rest-docs to 
latest code
URL: https://github.com/apache/flink/pull/8305
 
 
   ## What is the purpose of the change
   
   Regenerate rest-docs to latest code. I happen to notice this when I plan to 
implement [FLINK-12333](https://issues.apache.org/jira/browse/FLINK-12333). To 
avoid such hot-fix in the feature, I create an issue 
[FLINK-12358](https://issues.apache.org/jira/browse/FLINK-12358) to track this 
problem.
   
   
   ## Brief change log
   
   Regenerate rest-docs to latest code.
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): **no**
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
 - The serializers: **no**
 - The runtime per-record code paths (performance sensitive): **no**
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
 - The S3 file system connector: **no**
   
   ## Documentation
   
 - Does this pull request introduce a new feature? **no**
 - If yes, how is the feature documented? **not applicable**
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request

2019-04-28 Thread Yun Tang (JIRA)
Yun Tang created FLINK-12358:


 Summary: Verify whether rest documenation needs to be updated when 
building pull request
 Key: FLINK-12358
 URL: https://issues.apache.org/jira/browse/FLINK-12358
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Yun Tang
Assignee: Yun Tang


Currently, unlike configuration docs, rest-API docs have no any methods to 
check whether updated to latest code. This is really annoying and not easy to 
track if only checked by developers.

I plan to check this in travis to verify whether any files have been updated by 
using `git status`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files

2019-04-28 Thread GitBox
flinkbot commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid 
failure by uncleaned files
URL: https://github.com/apache/flink/pull/8304#issuecomment-487386856
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] Aitozi commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files

2019-04-28 Thread GitBox
Aitozi commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid 
failure by uncleaned files
URL: https://github.com/apache/flink/pull/8304#issuecomment-487386975
 
 
   And I see there are quite a lot other tests rely on `deleteOnExit` to do 
clean up. Should I do the fix for them, too?  Please take a look when you free 
, Thanks @zentol .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] Aitozi opened a new pull request #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files

2019-04-28 Thread GitBox
Aitozi opened a new pull request #8304: [hotfix]Harden TextInputFormatTest to 
avoid failure by uncleaned files
URL: https://github.com/apache/flink/pull/8304
 
 
   ## What is the purpose of the change
   
   When I run test locally, it happened to failed and with the uncleaned temp 
file leaving in the tmp dir. So the test case can not pass ever until i delete 
the correspond files.
   
   It use `deleteOnExit` to clean up the temp file created by test case, But 
the method may not work when JVM crash or killed. I think it may make the test 
case fragile.   
   
   ## Brief change log
   
   Add the @Before and @After prepare / cleanup method
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] klion26 commented on issue #8164: [FLINK-12184] history server compatible with old version

2019-04-28 Thread GitBox
klion26 commented on issue #8164: [FLINK-12184] history server compatible with 
old version
URL: https://github.com/apache/flink/pull/8164#issuecomment-487385580
 
 
   @yumengz5 thanks for your reply, I'll file a new pr for this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] HuangZhenQiu commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration

2019-04-28 Thread GitBox
HuangZhenQiu commented on issue #8303: [FLINK-12343]add file replication config 
for yarn configuration
URL: https://github.com/apache/flink/pull/8303#issuecomment-487385194
 
 
   @rmetzger @tillrohrmann 
   Please have a look this PR when you have time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12343) Allow set file.replication in Yarn Configuration

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12343:
---
Labels: pull-request-available  (was: )

> Allow set file.replication in Yarn Configuration
> 
>
> Key: FLINK-12343
> URL: https://issues.apache.org/jira/browse/FLINK-12343
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 
> replications. From our production experience, we find that 3 replications 
> will block big job (256 containers) to launch, when the HDFS is slow due to 
> big workload for batch pipelines. Thus, we want to make the factor 
> customizable from FlinkYarnSessionCli by adding an option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration

2019-04-28 Thread GitBox
flinkbot commented on issue #8303: [FLINK-12343]add file replication config for 
yarn configuration
URL: https://github.com/apache/flink/pull/8303#issuecomment-487385120
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] HuangZhenQiu opened a new pull request #8303: [FLINK-12343]add file replication config for yarn configuration

2019-04-28 Thread GitBox
HuangZhenQiu opened a new pull request #8303: [FLINK-12343]add file replication 
config for yarn configuration
URL: https://github.com/apache/flink/pull/8303
 
 
   ## What is the purpose of the change
   This pull request add file replication config for Flink yarn configuration. 
This is helpful to accelerate the bootstrap time of containers when Flink 
applications needs 100+ containers by changing the default replication number 
(such 3 for hdfs).
   
   ## Brief change log
 - Add a file-replication in YarnConfigOptions
 - Apply the file-replication  value for each of  application master local 
resources
 - Apply the default file replication value 3 for task manager local 
resources
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (docs)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12343) Allow set file.replication in Yarn Configuration

2019-04-28 Thread Zhenqiu Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenqiu Huang updated FLINK-12343:
--
Summary: Allow set file.replication in Yarn Configuration  (was: Allow set 
hdfs.replication in Yarn Configuration)

> Allow set file.replication in Yarn Configuration
> 
>
> Key: FLINK-12343
> URL: https://issues.apache.org/jira/browse/FLINK-12343
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Minor
>
> Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 
> replications. From our production experience, we find that 3 replications 
> will block big job (256 containers) to launch, when the HDFS is slow due to 
> big workload for batch pipelines. Thus, we want to make the factor 
> customizable from FlinkYarnSessionCli by adding an option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime

2019-04-28 Thread GitBox
flinkbot commented on issue #8302: [FLINK-12269][table-blink] Support Temporal 
Table Join in blink planner and runtime
URL: https://github.com/apache/flink/pull/8302#issuecomment-487382537
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12269) Support Temporal Table Join in blink planner

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12269:
---
Labels: pull-request-available  (was: )

> Support Temporal Table Join in blink planner
> 
>
> Key: FLINK-12269
> URL: https://issues.apache.org/jira/browse/FLINK-12269
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Planner
>Reporter: Jark Wu
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
>
> Support translate following "FOR SYSTEM_TIME AS OF" query into 
> {{StreamExecTemporalTableJoin}}.
> {code:sql}
> SELECT
>   o.amout, o.currency, r.rate, o.amount * r.rate
> FROM
>   Orders AS o
>   JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r
>   ON r.currency = o.currency
> {code}
> This is an extension to current temporal join (FLINK-9738) using a standard 
> syntax introduced in Calcite 1.19.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] wuchong opened a new pull request #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime

2019-04-28 Thread GitBox
wuchong opened a new pull request #8302: [FLINK-12269][table-blink] Support 
Temporal Table Join in blink planner and runtime
URL: https://github.com/apache/flink/pull/8302
 
 
   
   ## What is the purpose of the change
   
   Support translate "FOR SYSTEM_TIME AS OF" query into temporal table join for 
both Batch and Stream.
   
   ## Brief change log
   
 - Introduce some API interface to support lookup table source
   - `LookupableTableSource`, `AsyncTableFunction`, `DefinedPrimaryKey`, 
`DefinedTableIndex`, `TableIndex`, `LookupConfig`
- Support for generating optimized logical & physical plan for temporal 
table join for STREAM and BATCH.
- Introduce temporal table join operators and async join operators for 
runtime.

   Some differences between this PR and blink branch.
   
- Support returns multiple lines for async temporal table join
- Only allow `FOR SYSTEM_TIME AS OF` on left table's proctime field, not a 
`PROCTIME()` builtin function. This makes syntax clean.
- Lookup function support external types (i.e. `String`, `Timestamp`...).
   
   
   ## Verifying this change
   
   - Add several validation tests, plan tests , and integration tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12343) Allow set hdfs.replication in Yarn Configuration

2019-04-28 Thread Zhenqiu Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827997#comment-16827997
 ] 

Zhenqiu Huang commented on FLINK-12343:
---

It can be set by in both way. Here is to set the bigger number for yarn 
applications with container requirement more than 128 containers to accelerate 
the bootstrap of container initialization. If changing the replication factor 
for a directory will only affect the existing files and the new files under the 
directory will get created with the default replication factor (dfs.replication 
from hdfs-site.xml) of the cluster. As we don't want to apply bigger 
replication number for any of other files in hdfs, I think we have to set from 
client side. How do you think?

> Allow set hdfs.replication in Yarn Configuration
> 
>
> Key: FLINK-12343
> URL: https://issues.apache.org/jira/browse/FLINK-12343
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Minor
>
> Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 
> replications. From our production experience, we find that 3 replications 
> will block big job (256 containers) to launch, when the HDFS is slow due to 
> big workload for batch pipelines. Thus, we want to make the factor 
> customizable from FlinkYarnSessionCli by adding an option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers

2019-04-28 Thread Zhenqiu Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827993#comment-16827993
 ] 

Zhenqiu Huang commented on FLINK-12342:
---

[~till.rohrmann]
The job that has the issue tries to acquire 256 containers. At the same time it 
starts, hdfs that serves the jar is very relatively slow. You can image the sum 
of 1 + 2 + 3  +    T. It stops to acquire new when 256  - T < the appending 
request in  AMRMClientAsync.

> Yarn Resource Manager Acquires Too Many Containers
> --
>
> Key: FLINK-12342
> URL: https://issues.apache.org/jira/browse/FLINK-12342
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Environment: We runs job in Flink release 1.6.3. 
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Critical
>
> In currently implementation of YarnFlinkResourceManager, it starts to acquire 
> new container one by one when get request from SlotManager. The mechanism 
> works when job is still, say less than 32 containers. If the job has 256 
> container, containers can't be immediately allocated and appending requests 
> in AMRMClient will be not removed accordingly. We observe the situation that 
> AMRMClient ask for current pending request + 1 (the new request from slot 
> manager) containers. In this way, during the start time of such job, it asked 
> for 4000+ containers. If there is an external dependency issue happens, for 
> example hdfs access is slow. Then, the whole job will be blocked without 
> getting enough resource and finally killed with SlotManager request timeout.
> Thus, we should use the total number of container asked rather than pending 
> request in AMRMClient as threshold to make decision whether we need to add 
> one more resource request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-12307) Support translation from StreamExecWindowJoin to StreamTransformation.

2019-04-28 Thread Kurt Young (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young closed FLINK-12307.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

merged in 1.9.0: 177e7ff1291f397294f7b7dab1d791a07adc6a5a

> Support translation from StreamExecWindowJoin to StreamTransformation.
> --
>
> Key: FLINK-12307
> URL: https://issues.apache.org/jira/browse/FLINK-12307
> Project: Flink
>  Issue Type: Task
>  Components: Table SQL / API
>Reporter: Jing Zhang
>Assignee: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] KurtYoung merged pull request #8261: [FLINK-12307][table-planner-blink] Support translation from StreamExecWindowJoin to StreamTransformation.

2019-04-28 Thread GitBox
KurtYoung merged pull request #8261: [FLINK-12307][table-planner-blink] Support 
translation from StreamExecWindowJoin to StreamTransformation.
URL: https://github.com/apache/flink/pull/8261
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread Yumeng Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827982#comment-16827982
 ] 

Yumeng Zhang commented on FLINK-12183:
--

lamber-ken, I agree with you. It's not a good idea to break the interface. But 
like I said just skip null values looks like a little weird. It seems we don't 
know why there are null values. 
For the attempt times, in general, we cannot let the streaming job attempt many 
times. But think about this case, if our streaming job hits a bad record that 
it cannot handle ,and we have enabled the checkpoint and not configured restart 
strategy, it will try to recover again and again until we find that. The number 
of attempts depends on how soon we find this. We cannot guarantee the number of 
attempts is just several times. Maybe it's efficient to judge the elements one 
by one in EvictingBoundedList. But I think the best way to solve this null 
pointer problem is to not  break the interface and meanwhile get the right 
start index.

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will 
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. 
> It's hard to find that because here it only catches IOException. In 
> SubtaskExecutionAttemptDetailsHandler and  
> SubtaskExecutionAttemptAccumulatorsHandler, when calling 
> archiveJsonWithPath() method, it will construct some json information about 
> prior execution attempts but the index is from 0 which might be dropped index 
> for the for loop.  In default, it will return null when trying to get the 
> prior execution attempt (AccessExecution attempt = 
> subtask.getPriorExecutionAttempt(x)).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig

2019-04-28 Thread GitBox
flinkbot commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove 
useless code in TableConfig
URL: https://github.com/apache/flink/pull/8301#issuecomment-487373286
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12357) Remove useless code in TableConfig

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12357:
---
Labels: pull-request-available  (was: )

> Remove useless code in TableConfig
> --
>
> Key: FLINK-12357
> URL: https://issues.apache.org/jira/browse/FLINK-12357
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Hequn Cheng
>Assignee: Hequn Cheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] hequn8128 opened a new pull request #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig

2019-04-28 Thread GitBox
hequn8128 opened a new pull request #8301: [FLINK-12357][api-java][hotfix] 
Remove useless code in TableConfig
URL: https://github.com/apache/flink/pull/8301
 
 
   
   ## What is the purpose of the change
   
   Remove useless code in TableConfig. It was added accidentally in 
[FLINK-11067](https://issues.apache.org/jira/browse/FLINK-11067).
   
   
   ## Brief change log
   
 - Remove `userDefinedConfig` in `TableConfig`
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] yumengz5 commented on issue #8164: [FLINK-12184] history server compatible with old version

2019-04-28 Thread GitBox
yumengz5 commented on issue #8164: [FLINK-12184] history server compatible with 
old version
URL: https://github.com/apache/flink/pull/8164#issuecomment-487372370
 
 
   @klion26 Sorry about the late reply. All right. You can take over it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12343) Allow set hdfs.replication in Yarn Configuration

2019-04-28 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827957#comment-16827957
 ] 

Till Rohrmann commented on FLINK-12343:
---

Isn't the replication factor an HDFS configuration you need to specify when 
starting the HDFS cluster or can Yarn applications specify this on their own?

> Allow set hdfs.replication in Yarn Configuration
> 
>
> Key: FLINK-12343
> URL: https://issues.apache.org/jira/browse/FLINK-12343
> Project: Flink
>  Issue Type: Improvement
>  Components: Command Line Client, Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Minor
>
> Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 
> replications. From our production experience, we find that 3 replications 
> will block big job (256 containers) to launch, when the HDFS is slow due to 
> big workload for batch pipelines. Thus, we want to make the factor 
> customizable from FlinkYarnSessionCli by adding an option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers

2019-04-28 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827956#comment-16827956
 ] 

Till Rohrmann commented on FLINK-12342:
---

Thanks for reporting this issue [~hpeter]. For my understanding, will the 
previous container requests be added up with every call to 
{{AMRMClientAsync#addContainerRequest}} to the total number of container 
requests? So to say, the system would request 1 + 2 + 3 + 4 + 5 + ... + n if 
calling n times {{addContainerRequest}} and if no container request could be 
fulfilled in the meantime? Or how does it happen that we add 4000+ container 
requests to the {{AMRMClient}}? 

> Yarn Resource Manager Acquires Too Many Containers
> --
>
> Key: FLINK-12342
> URL: https://issues.apache.org/jira/browse/FLINK-12342
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Environment: We runs job in Flink release 1.6.3. 
>Reporter: Zhenqiu Huang
>Assignee: Zhenqiu Huang
>Priority: Critical
>
> In currently implementation of YarnFlinkResourceManager, it starts to acquire 
> new container one by one when get request from SlotManager. The mechanism 
> works when job is still, say less than 32 containers. If the job has 256 
> container, containers can't be immediately allocated and appending requests 
> in AMRMClient will be not removed accordingly. We observe the situation that 
> AMRMClient ask for current pending request + 1 (the new request from slot 
> manager) containers. In this way, during the start time of such job, it asked 
> for 4000+ containers. If there is an external dependency issue happens, for 
> example hdfs access is slow. Then, the whole job will be blocked without 
> getting enough resource and finally killed with SlotManager request timeout.
> Thus, we should use the total number of container asked rather than pending 
> request in AMRMClient as threshold to make decision whether we need to add 
> one more resource request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] link3280 closed pull request #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread GitBox
link3280 closed pull request #8299: [FLINK-12356][BuildSystem] Optimise version 
experssion of flink-shaded-hadoop2(-uber)
URL: https://github.com/apache/flink/pull/8299
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (FLINK-12346) Scala-suffix check broken on Travis

2019-04-28 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-12346.

Resolution: Fixed

master: 8f02746ec7316ccdc6d25e915f3b58595438cc44

> Scala-suffix check broken on Travis
> ---
>
> Key: FLINK-12346
> URL: https://issues.apache.org/jira/browse/FLINK-12346
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Travis
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> the scala-suffix check currently does not work on travis since the maven 
> output is not what the script expects. On travis we have timestamps in the 
> maven output, which breaks the parsing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol merged pull request #8289: [FLINK-12346][travis][build] Account for timestampts in scala-suffix check

2019-04-28 Thread GitBox
zentol merged pull request #8289: [FLINK-12346][travis][build] Account for 
timestampts in scala-suffix check
URL: https://github.com/apache/flink/pull/8289
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12260) Slot allocation failure by taskmanager registration timeout and race

2019-04-28 Thread Xiaolei Jia (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827919#comment-16827919
 ] 

Xiaolei Jia commented on FLINK-12260:
-

Hi,

If this issue is still open, I'd like to fix this problem with an attempt 
count, as [~till.rohrmann] suggested.  

Thanks.

> Slot allocation failure by taskmanager registration timeout and race
> 
>
> Key: FLINK-12260
> URL: https://issues.apache.org/jira/browse/FLINK-12260
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.3
>Reporter: Hwanju Kim
>Priority: Critical
> Attachments: FLINK-12260-repro.diff
>
>
>  
> In 1.6.2., we have seen slot allocation failure keep happening for long time. 
> Having looked at the log, I see the following behavior:
>  # TM sends a registration request R1 to resource manager.
>  # R1 times out after 100ms, which is initial timeout.
>  # TM retries a registration request R2 to resource manager (with timeout 
> 200ms).
>  # R2 arrives first at resource manager and registered, and then TM gets 
> successful response moving onto step 5 below.
>  # On successful registration, R2's instance is put to 
> taskManagerRegistrations
>  # Then R1 arrives at resource manager and realizes the same TM resource ID 
> is already registered, which then unregisters R2's instance ID from 
> taskManagerRegistrations. A new instance ID for R1 is registered to 
> workerRegistration.
>  # R1's response is not handled though since it already timed out (see akka 
> temp actor resolve failure below), hence no registration to 
> taskManagerRegistrations.
>  # TM keeps heartbeating to the resource manager with slot status.
>  # Resource manager ignores this slot status, since taskManagerRegistrations 
> contains R2, not R1, which replaced R2 in workerRegistration at step 6.
>  # Slot request can never be fulfilled, timing out.
> The following is the debug logs for the above steps:
>  
> {code:java}
> JM log:
> 2019-04-11 22:39:40.000,Registering TaskManager with ResourceID 
> 46c8e0d0fcf2c306f11954a1040d5677 
> (akka.ssl.tcp://flink@flink-taskmanager:6122/user/taskmanager_0) at 
> ResourceManager
> 2019-04-11 22:39:40.000,Registering TaskManager 
> 46c8e0d0fcf2c306f11954a1040d5677 under deade132e2c41c52019cdc27977266cf at 
> the SlotManager.
> 2019-04-11 22:39:40.000,Replacing old registration of TaskExecutor 
> 46c8e0d0fcf2c306f11954a1040d5677.
> 2019-04-11 22:39:40.000,Unregister TaskManager 
> deade132e2c41c52019cdc27977266cf from the SlotManager.
> 2019-04-11 22:39:40.000,Registering TaskManager with ResourceID 
> 46c8e0d0fcf2c306f11954a1040d5677 
> (akka.ssl.tcp://flink@flink-taskmanager:6122/user/taskmanager_0) at 
> ResourceManager
> TM log:
> 2019-04-11 22:39:40.000,Registration at ResourceManager attempt 1 
> (timeout=100ms)
> 2019-04-11 22:39:40.000,Registration at ResourceManager 
> (akka.ssl.tcp://flink@flink-jobmanager:6123/user/resourcemanager) attempt 1 
> timed out after 100 ms
> 2019-04-11 22:39:40.000,Registration at ResourceManager attempt 2 
> (timeout=200ms)
> 2019-04-11 22:39:40.000,Successful registration at resource manager 
> akka.ssl.tcp://flink@flink-jobmanager:6123/user/resourcemanager under 
> registration id deade132e2c41c52019cdc27977266cf.
> 2019-04-11 22:39:41.000,resolve of path sequence [/temp/$c] failed{code}
>  
> As RPC calls seem to use akka ask, which creates temporary source actor, I 
> think multiple RPC calls could've arrived out or order by different actor 
> pairs and the symptom above seems to be due to that. If so, it could have 
> attempt account in the call argument to prevent unexpected unregistration? At 
> this point, what I have done is only log analysis, so I could do further 
> analysis, but before that wanted to check if it's a known issue. I also 
> searched with some relevant terms and log pieces, but couldn't find the 
> duplicate. Please deduplicate if any.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10052) Tolerate temporarily suspended ZooKeeper connections

2019-04-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/FLINK-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827912#comment-16827912
 ] 

Dominik Wosiński commented on FLINK-10052:
--

Hey, 
Sorry, somehow I have missed the issue. I will fix it ASAP.

 

> Tolerate temporarily suspended ZooKeeper connections
> 
>
> Key: FLINK-10052
> URL: https://issues.apache.org/jira/browse/FLINK-10052
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.4.2, 1.5.2, 1.6.0
>Reporter: Till Rohrmann
>Assignee: Dominik Wosiński
>Priority: Major
>
> This issue results from FLINK-10011 which uncovered a problem with Flink's HA 
> recovery and proposed the following solution to harden Flink:
> The {{ZooKeeperLeaderElectionService}} uses the {{LeaderLatch}} Curator 
> recipe for leader election. The leader latch revokes leadership in case of a 
> suspended ZooKeeper connection. This can be premature in case that the system 
> can reconnect to ZooKeeper before its session expires. The effect of the lost 
> leadership is that all jobs will be canceled and directly restarted after 
> regaining the leadership.
> Instead of directly revoking the leadership upon a SUSPENDED ZooKeeper 
> connection, it would be better to wait until the ZooKeeper connection is 
> LOST. That way we would allow the system to reconnect and not lose the 
> leadership. This could be achievable by using Curator's {{LeaderSelector}} 
> instead of the {{LeaderLatch}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12219) Yarn application can't stop when flink job failed in per-job yarn cluste mode

2019-04-28 Thread lamber-ken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824919#comment-16824919
 ] 

lamber-ken edited comment on FLINK-12219 at 4/28/19 10:41 AM:
--

no, in detach mode.


was (Author: lamber-ken):
right.

> Yarn application can't stop when flink job failed in per-job yarn cluste mode
> -
>
> Key: FLINK-12219
> URL: https://issues.apache.org/jira/browse/FLINK-12219
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Runtime / REST
>Affects Versions: 1.6.3, 1.8.0
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: fix-bug.patch, image-2019-04-17-15-00-40-687.png, 
> image-2019-04-17-15-02-49-513.png, image-2019-04-23-17-37-00-081.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. *Issue detail info*
> In our flink(1.6.3) product env, I often encounter a scene that yarn 
> application can't stop when flink job failed in per-job yarn cluste mode, so 
> I deeply analyzed the reason why it happened.
> When a flink job fail, system will write an archive file to a FileSystem 
> through +MiniDispatcher#archiveExecutionGraph+ method, then notify 
> YarnJobClusterEntrypoint to shutDown. But, if 
> +MiniDispatcher#archiveExecutionGraph+ throw exceptions during execution, it 
> affect the following calls.
> So I open 
> [FLINK-12247|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-12247]
>  to solve NEP bug when system write archive to FileSystem. But We still need 
> to consider other exceptions, so we should catch Exception / Throwable not 
> just IOExcetion.
> h3. *Flink yarn job fail flow*
> !image-2019-04-23-17-37-00-081.png!
> h3. *Flink yarn job fail on yarn*
> !image-2019-04-17-15-00-40-687.png!
>  
> h3. *Flink yarn application can't stop*
> !image-2019-04-17-15-02-49-513.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread lamber-ken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901
 ] 

lamber-ken edited comment on FLINK-12183 at 4/28/19 10:33 AM:
--

[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits also.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.2.1 version to flink-1.6.3 version. As you says, 
it's hard to find and incidental, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219.

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient 
that judge the element exists or not just by index. Generally, we just allow 
streaming job only attempt several times, not up to 1000 times. So just skip 
null value may be more appropriate from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
 

 


was (Author: lamber-ken):
[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits also.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219.

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient 
that judge the element exists or not just by index. Generally, we just allow 
streaming job only attempt several times, not up to 1000 times. So just skip 
null value may be more appropriate from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
 

 

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationF

[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread lamber-ken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901
 ] 

lamber-ken edited comment on FLINK-12183 at 4/28/19 10:29 AM:
--

[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits also.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219.

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient 
that judge the element exists or not just by index. Generally, we just allow 
streaming job only attempt several times, not up to 1000 times. So just skip 
null value may be more appropriate from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
 

 


was (Author: lamber-ken):
[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits too.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219.

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient 
that judge the element exists or not just by index. Generally, we just allow 
streaming job only attempt several times, not up to 1000 times. So just skip 
null value may be more appropriate from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
 

 

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The Com

[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread lamber-ken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901
 ] 

lamber-ken edited comment on FLINK-12183 at 4/28/19 10:28 AM:
--

[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits too.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219.

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient 
that judge the element exists or not just by index. Generally, we just allow 
streaming job only attempt several times, not up to 1000 times. So just skip 
null value may be more appropriate from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
 

 


was (Author: lamber-ken):
[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits too.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219.

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so judge the element 
exists or not just by index. So just skip null value may be more appropriate 
from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
  

 

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will 
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. 
> It's

[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread lamber-ken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901
 ] 

lamber-ken edited comment on FLINK-12183 at 4/28/19 10:20 AM:
--

[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits too.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219.

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so judge the element 
exists or not just by index. So just skip null value may be more appropriate 
from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
  

 


was (Author: lamber-ken):
[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits too.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219. 

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so judge the element 
exists or not just by index. So just skip null value may be more appropriate 
from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block{color:#660e7a}
{color}

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
 

 

 

 

 

 

 

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will 
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. 
> It's hard to find that because here it only catches IOException. In 
> SubtaskEx

[jira] [Commented] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread lamber-ken (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901
 ] 

lamber-ken commented on FLINK-12183:


[~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create 
a duplicate issue. When I create FLINK-12247, I just check the lastest 
SubtaskExecutionAttemptDetailsHandler and  
SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found 
this problem exits too.

*First,* This problem has been bothering us for a long time. This problem has 
always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, 
it's hard to find, so I also use uml to describe the flow 
https://issues.apache.org/jira/browse/FLINK-12219. 

*Second,* At the beginning, I use a way to solve this problem like your patch. 
But I think it's not so well, because it breaks the interface and 
[ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3]
 no need to implement it. BTW, {color:#660e7a}priorExecutions 
{color}{color:#33}is a {color}EvictingBoundedList, so judge the element 
exists or not just by index. So just skip null value may be more appropriate 
from my side.

{color:#33}*Third,* to{color} prevention unexpected RuntimeException, we 
should move `jobTerminationFuture.complete` to a finally block{color:#660e7a}
{color}

 
{code:java}
protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph 
archivedExecutionGraph) {

try {
super.jobReachedGloballyTerminalState(archivedExecutionGraph);
} catch (Exception e) {
log.error("jobReachedGloballyTerminalState exception", e);
} finally {
if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) {
// shut down since we don't have to wait for the execution result 
retrieval

jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState()));
}
}
}

{code}
 

 

 

 

 

 

 

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will 
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. 
> It's hard to find that because here it only catches IOException. In 
> SubtaskExecutionAttemptDetailsHandler and  
> SubtaskExecutionAttemptAccumulatorsHandler, when calling 
> archiveJsonWithPath() method, it will construct some json information about 
> prior execution attempts but the index is from 0 which might be dropped index 
> for the for loop.  In default, it will return null when trying to get the 
> prior execution attempt (AccessExecution attempt = 
> subtask.getPriorExecutionAttempt(x)).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12357) Remove useless code in TableConfig

2019-04-28 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-12357:
---

 Summary: Remove useless code in TableConfig
 Key: FLINK-12357
 URL: https://issues.apache.org/jira/browse/FLINK-12357
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Hequn Cheng
Assignee: Hequn Cheng






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12175) TypeExtractor.getMapReturnTypes produces different TypeInformation than createTypeInformation for classes with parameterized ancestors

2019-04-28 Thread YangFei (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827895#comment-16827895
 ] 

YangFei commented on FLINK-12175:
-

[~andbul] are you working on this fix? Can I work on this ticket?

> TypeExtractor.getMapReturnTypes produces different TypeInformation than 
> createTypeInformation for classes with parameterized ancestors
> --
>
> Key: FLINK-12175
> URL: https://issues.apache.org/jira/browse/FLINK-12175
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.7.2, 1.9.0
>Reporter: Dylan Adams
>Priority: Major
>
> I expect that the {{TypeMapper}} {{createTypeInformation}} and 
> {{getMapReturnTypes}} would produce equivalent type information for the same 
> type. But when there is a parameterized superclass, this does not appear to 
> be the case.
> Here's a test case that could be added to {{PojoTypeExtractorTest.java}} that 
> demonstrates the issue:
> {code}
> public static class Pojo implements Serializable {
>   public int digits;
>   public String letters;
> }
> public static class ParameterizedParent implements 
> Serializable {
>   public T pojoField;
> }
> public static class ConcreteImpl extends ParameterizedParent {
>   public double precise;
> }
> public static class ConcreteMapFunction implements MapFunction ConcreteImpl> {
>   @Override
>   public ConcreteImpl map(ConcreteImpl value) throws Exception {
>   return null;
>   }
> }
> @Test
> public void testMapReturnType() {
>   final TypeInformation directTypeInfo = 
> TypeExtractor.createTypeInfo(ConcreteImpl.class);
>   Assert.assertTrue(directTypeInfo instanceof PojoTypeInfo);
>   TypeInformation directPojoFieldTypeInfo = ((PojoTypeInfo) 
> directTypeInfo).getPojoFieldAt(0).getTypeInformation();
>   Assert.assertTrue(directPojoFieldTypeInfo instanceof PojoTypeInfo);
>   final TypeInformation mapReturnTypeInfo
>   = TypeExtractor.getMapReturnTypes(new ConcreteMapFunction(), 
> directTypeInfo);
>   Assert.assertTrue(mapReturnTypeInfo instanceof PojoTypeInfo);
>   TypeInformation mapReturnPojoFieldTypeInfo = ((PojoTypeInfo) 
> mapReturnTypeInfo).getPojoFieldAt(0).getTypeInformation();
>   Assert.assertTrue(mapReturnPojoFieldTypeInfo instanceof PojoTypeInfo);
>   Assert.assertEquals(directTypeInfo, mapReturnTypeInfo);
> }
> {code}
> This test case will fail on the last two asserts because 
> {{getMapReturnTypes}} produces a {{TypeInformation}} for {{ConcreteImpl}} 
> with a {{GenericTypeInfo}} for the {{pojoField}}, whereas 
> {{createTypeInformation}} correctly produces a {{PojoTypeInfo}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] wujinhu removed a comment on issue #7798: [FLINK-11388][fs] Add Aliyun OSS recoverable writer

2019-04-28 Thread GitBox
wujinhu removed a comment on issue #7798: [FLINK-11388][fs] Add Aliyun OSS 
recoverable writer
URL: https://github.com/apache/flink/pull/7798#issuecomment-471811588
 
 
   @StefanRRichter would you please help review this PR? Thanks :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] wujinhu commented on issue #7798: [FLINK-11388][fs] Add Aliyun OSS recoverable writer

2019-04-28 Thread GitBox
wujinhu commented on issue #7798: [FLINK-11388][fs] Add Aliyun OSS recoverable 
writer
URL: https://github.com/apache/flink/pull/7798#issuecomment-487364585
 
 
   Rebase master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-11762) Update WindowOperatorMigrationTest for 1.8

2019-04-28 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827885#comment-16827885
 ] 

vinoyang commented on FLINK-11762:
--

[~yangfei] [https://github.com/apache/flink/pull/8168]

> Update WindowOperatorMigrationTest for 1.8
> --
>
> Key: FLINK-11762
> URL: https://issues.apache.org/jira/browse/FLINK-11762
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Tests
>Affects Versions: 1.8.0
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 1.9.0
>
>
> Update {{WindowOperatorMigrationTest}} so that it covers restoring from 1.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8300: [FLINK-11638] Translate Savepoints page into Chinese

2019-04-28 Thread GitBox
flinkbot commented on issue #8300: [FLINK-11638] Translate Savepoints page into 
Chinese
URL: https://github.com/apache/flink/pull/8300#issuecomment-487363436
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-11762) Update WindowOperatorMigrationTest for 1.8

2019-04-28 Thread YangFei (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827883#comment-16827883
 ] 

YangFei commented on FLINK-11762:
-

[~yanghua] can you tell me the  umbrella issue which contains all of the 
subtasks? I want to have a look

> Update WindowOperatorMigrationTest for 1.8
> --
>
> Key: FLINK-11762
> URL: https://issues.apache.org/jira/browse/FLINK-11762
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream, Tests
>Affects Versions: 1.8.0
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 1.9.0
>
>
> Update {{WindowOperatorMigrationTest}} so that it covers restoring from 1.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11638) Translate "Savepoints" page into Chinese

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-11638:
---
Labels: pull-request-available  (was: )

> Translate "Savepoints" page into Chinese
> 
>
> Key: FLINK-11638
> URL: https://issues.apache.org/jira/browse/FLINK-11638
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>
> doc locates in flink/docs/ops/state/savepoints.zh.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations

2019-04-28 Thread YangFei (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827882#comment-16827882
 ] 

YangFei commented on FLINK-12141:
-

I am interesting in this issue . [~gyfora] are you woking on this issue now? if 
not ,I can do that .Thanks .

> Allow @TypeInfo annotation on POJO field declarations
> -
>
> Key: FLINK-12141
> URL: https://issues.apache.org/jira/browse/FLINK-12141
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Type Serialization System
>Reporter: Gyula Fora
>Priority: Minor
>  Labels: starter
>
> The TypeInfo annotation is a great way to declare serializers for custom 
> types however I feel that it's usage is limited by the fact that it can only 
> be used on types that are declared in the project.
> By allowing the annotation to be used on field declarations we could improve 
> the TypeExtractor logic to use the type factory when creating the 
> PojoTypeInformation.
> This would be a big improvement as in many cases classes from other libraries 
> or collection types are used within custom Pojo classes and Flink would 
> default to Kryo serialization which would hurt performance and cause problems 
> later.
> The current workaround in these cases is to implement a custom serializer for 
> the entire pojo which is a waste of effort when only a few fields might 
> require custom serialization logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] XuQianJin-Stars opened a new pull request #8300: [FLINK-11638] Translate Savepoints page into Chinese

2019-04-28 Thread GitBox
XuQianJin-Stars opened a new pull request #8300: [FLINK-11638] Translate 
Savepoints page into Chinese
URL: https://github.com/apache/flink/pull/8300
 
 
   ## What is the purpose of the change
   
   *(This pull request to Translate "Savepoints" page into Chinese.)*
   
   
   ## Brief change log
   
 - *doc locates in flink/docs/ops/state/savepoints.zh.md*
   
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   no change to test.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] link3280 edited a comment on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread GitBox
link3280 edited a comment on issue #8299: [FLINK-12356][BuildSystem] Optimise 
version experssion of flink-shaded-hadoop2(-uber)
URL: https://github.com/apache/flink/pull/8299#issuecomment-487362995
 
 
   Oops. It seems that these modules would be removed in the future. 
https://github.com/apache/flink/pull/8225


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] link3280 commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread GitBox
link3280 commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version 
experssion of flink-shaded-hadoop2(-uber)
URL: https://github.com/apache/flink/pull/8299#issuecomment-487362995
 
 
   Oops. It seems that these modules would be removed in the future. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12356) Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12356:
---
Labels: pull-request-available  (was: )

> Optimise version experssion of flink-shaded-hadoop2(-uber)
> --
>
> Key: FLINK-12356
> URL: https://issues.apache.org/jira/browse/FLINK-12356
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.8.1
>Reporter: Paul Lin
>Assignee: Paul Lin
>Priority: Trivial
>  Labels: pull-request-available
>
> Since the new version scheme for hadoop-based modules, we use version 
> literals in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be 
> replaced by `${parent.version}` variable for better management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread GitBox
flinkbot commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version 
experssion of flink-shaded-hadoop2(-uber)
URL: https://github.com/apache/flink/pull/8299#issuecomment-487362574
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] link3280 opened a new pull request #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread GitBox
link3280 opened a new pull request #8299: [FLINK-12356][BuildSystem] Optimise 
version experssion of flink-shaded-hadoop2(-uber)
URL: https://github.com/apache/flink/pull/8299
 
 
   ## What is the purpose of the change
   
   Since the new version scheme for hadoop-based modules, we use version 
literals in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be 
replaced by `${parent.version}` variable for better management.
   
   ## Brief change log
   
   - Use `${parent.version}` to replace version literals in version expressions.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no 
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (FLINK-12347) flink-table-runtime-blink is missing scala suffix

2019-04-28 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-12347.

Resolution: Fixed

master: 6bc25c2c3691d9bd2c12fcb0b3c123cf5ab70b7b

> flink-table-runtime-blink is missing scala suffix
> -
>
> Key: FLINK-12347
> URL: https://issues.apache.org/jira/browse/FLINK-12347
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Table SQL / Runtime
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: Liya Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{flink-table-runtime-blink}} has a dependency on {{flink-streaming-java}} 
> and thus requires a scala suffix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zentol merged pull request #8293: [FLINK-12347][Table SQL/Runtime]add scala suffix for flink-table-runt…

2019-04-28 Thread GitBox
zentol merged pull request #8293: [FLINK-12347][Table SQL/Runtime]add scala 
suffix for flink-table-runt…
URL: https://github.com/apache/flink/pull/8293
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (FLINK-12356) Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread Paul Lin (JIRA)
Paul Lin created FLINK-12356:


 Summary: Optimise version experssion of flink-shaded-hadoop2(-uber)
 Key: FLINK-12356
 URL: https://issues.apache.org/jira/browse/FLINK-12356
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.8.1
Reporter: Paul Lin
Assignee: Paul Lin


Since the new version scheme for hadoop-based modules, we use version literals 
in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be replaced by 
`${parent.version}` variable for better management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] eaglewatcherwb closed pull request #8298: test

2019-04-28 Thread GitBox
eaglewatcherwb closed pull request #8298: test
URL: https://github.com/apache/flink/pull/8298
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8298: test

2019-04-28 Thread GitBox
flinkbot commented on issue #8298: test
URL: https://github.com/apache/flink/pull/8298#issuecomment-487359619
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] eaglewatcherwb opened a new pull request #8298: test

2019-04-28 Thread GitBox
eaglewatcherwb opened a new pull request #8298: test
URL: https://github.com/apache/flink/pull/8298
 
 
   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12351) AsyncWaitOperator should deep copy StreamElement when object reuse is enabled

2019-04-28 Thread aitozi (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827861#comment-16827861
 ] 

aitozi commented on FLINK-12351:


Hi, [~jark]

I checked the objectuse config in other operator just now. Found that it's 
mostly been checked in batch related operator. Is only AsyncWaitOperator 
affected?  But i think other operator may be affected too. If user use the 
object reuse feature without paying attention to the side effect of the changes 
on the object as the doc of ExecutionConfig#enableObjectReuse say, the result 
may be unpredictable. Or we have to enable objectReuse to operator level to let 
user config the behaviour of the operator individually, what's your idea?

> AsyncWaitOperator should deep copy StreamElement when object reuse is enabled
> -
>
> Key: FLINK-12351
> URL: https://issues.apache.org/jira/browse/FLINK-12351
> Project: Flink
>  Issue Type: Bug
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.9.0
>
>
> Currently, AsyncWaitOperator directly put the input StreamElement into 
> {{StreamElementQueue}}. But when object reuse is enabled, the StreamElement 
> is reused, which means the element in {{StreamElementQueue}} will be 
> modified. As a result, the output of AsyncWaitOperator might be wrong.
> An easy way to fix this might be deep copy the input StreamElement when 
> object reuse is enabled, like this: 
> https://github.com/apache/flink/blob/blink/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/async/AsyncWaitOperator.java#L209



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode

2019-04-28 Thread Yumeng Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827860#comment-16827860
 ] 

Yumeng Zhang edited comment on FLINK-12183 at 4/28/19 8:41 AM:
---

FLINK-12247 seems to fix this problem, but it looks too brutal, just skips the 
null elements. If the subtask has many attempts like 1000 times, it will 
traverse the subtask's attempts 1000 times. I don't think this way truly fixes 
the problem. We have to understand why there are null elements and then fix it 
rather than ignore the cause of the problem.


was (Author: yumeng):
FLINK-12247 seems to fix this problem, but it looks too brutal, just skips the 
null elements. If the subtask has many attempts like 1000 times, it will 
traverse the subtask's attempts 1000 times. I don't think this way truly fixes 
the problem.

> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> 
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Yumeng Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job 
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing 
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will 
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. 
> It's hard to find that because here it only catches IOException. In 
> SubtaskExecutionAttemptDetailsHandler and  
> SubtaskExecutionAttemptAccumulatorsHandler, when calling 
> archiveJsonWithPath() method, it will construct some json information about 
> prior execution attempts but the index is from 0 which might be dropped index 
> for the for loop.  In default, it will return null when trying to get the 
> prior execution attempt (AccessExecution attempt = 
> subtask.getPriorExecutionAttempt(x)).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >