[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers
[ https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828949#comment-16828949 ] Till Rohrmann commented on FLINK-12342: --- Before diving into the implementation, I would first like to fully understand the problem. Concretely a Yarn reference would be good which explains the behaviour. Otherwise we might simply fix a symptom or not the problem at all. > Yarn Resource Manager Acquires Too Many Containers > -- > > Key: FLINK-12342 > URL: https://issues.apache.org/jira/browse/FLINK-12342 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Environment: We runs job in Flink release 1.6.3. >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In currently implementation of YarnFlinkResourceManager, it starts to acquire > new container one by one when get request from SlotManager. The mechanism > works when job is still, say less than 32 containers. If the job has 256 > container, containers can't be immediately allocated and appending requests > in AMRMClient will be not removed accordingly. We observe the situation that > AMRMClient ask for current pending request + 1 (the new request from slot > manager) containers. In this way, during the start time of such job, it asked > for 4000+ containers. If there is an external dependency issue happens, for > example hdfs access is slow. Then, the whole job will be blocked without > getting enough resource and finally killed with SlotManager request timeout. > Thus, we should use the total number of container asked rather than pending > request in AMRMClient as threshold to make decision whether we need to add > one more resource request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] beyond1920 commented on issue #8294: [FLINK-12348][table-planner-blink]Use TableConfig in api module to replace TableConfig in blink-planner module.
beyond1920 commented on issue #8294: [FLINK-12348][table-planner-blink]Use TableConfig in api module to replace TableConfig in blink-planner module. URL: https://github.com/apache/flink/pull/8294#issuecomment-487457504 @KurtYoung , after discussed with @hequn8128 offline, final TableConfig has two level configuration (BTW, there is a unused field `userDefinedConfig` in current `TableConfig`, @hequn8128 will delete it soon): * common configuration among multiple planners. * configuration for each specified planner, which implements interface `PlannerConfig`. For example, introduce `PlannerConfigImpl` in Blink planner module for Blink planner. Each planner may have different ConfigOptions, which could exist in API package of each planner module when multiple planner exist at same time. After current planner is deprecated, only one planner is retained, we could move it's `PlannerConfig` implementation and it's `ConfigOptions` to API module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module
bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module URL: https://github.com/apache/flink/pull/8205#issuecomment-487447400 @flinkbot attention @zentol This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8308: [FLINK-12361][Runtime/Operators]remove useless expression from runtim…
flinkbot commented on issue #8308: [FLINK-12361][Runtime/Operators]remove useless expression from runtim… URL: https://github.com/apache/flink/pull/8308#issuecomment-487442905 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on issue #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
WeiZhong94 commented on issue #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#issuecomment-487442930 @sunjincheng121 thanks for your review! I have rebased this PR and addressed your comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12361) Remove useless expression from runtime scheduler
[ https://issues.apache.org/jira/browse/FLINK-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12361: --- Labels: pull-request-available (was: ) > Remove useless expression from runtime scheduler > > > Key: FLINK-12361 > URL: https://issues.apache.org/jira/browse/FLINK-12361 > Project: Flink > Issue Type: Improvement > Components: Runtime / Operators >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Minor > Labels: pull-request-available > Attachments: image-2019-04-29-11-16-13-492.png > > > In the scheduleTask method of Scheduler class, expression > forceExternalLocation is useless, since it always evaluates to false: > !image-2019-04-29-11-16-13-492.png! > So it can be removed. Moreover, by removing this expression, the code > structure can be made much simpler, because there are some branches relying > this expression, which can also be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] liyafan82 opened a new pull request #8308: [FLINK-12361][Runtime/Operators]remove useless expression from runtim…
liyafan82 opened a new pull request #8308: [FLINK-12361][Runtime/Operators]remove useless expression from runtim… URL: https://github.com/apache/flink/pull/8308 …e scheduler ## What is the purpose of the change Remove useless expression forceExternalLocation from method Scheduler#scheduleTask ## Brief change log - Remove this expression from the method. - Remove the condititional branches dependent on this expression, since this expression always evaluates to false. ## Verifying this change Verifyed by manually run the tests in flink-runtime. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279230381 ## File path: flink-python/flink-python-table/src/main/python/pyflink/table/__init__.py ## @@ -0,0 +1,37 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from pyflink.table.table import Table +from pyflink.table.table_config import TableConfig +from pyflink.table.table_environment import TableEnvironment, StreamTableEnvironment, BatchTableEnvironment +from pyflink.table.table_sink import TableSink, CsvTableSink +from pyflink.table.table_source import TableSource, CsvTableSource +from pyflink.table.types import DataTypes + Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279230371 ## File path: flink-python/flink-python-table/src/main/python/setup.py ## @@ -0,0 +1,32 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from distutils.core import setup + +setup( Review comment: Fixed. Now the setup.py will check whether the python version is above 2.7. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279230215 ## File path: flink-python/flink-python-table/src/main/python/setup.py ## @@ -0,0 +1,32 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from distutils.core import setup + +setup( +name='pyflink', +version='1.0', Review comment: Yes, it makes sense to me. Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279230115 ## File path: flink-python/flink-python-table/src/main/python/setup.py ## @@ -0,0 +1,32 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from distutils.core import setup + +setup( +name='pyflink', +version='1.0', +packages=['pyflink', + 'pyflink.table', + 'pyflink.util'], +url='http://flink.apache.org', +license='Licensed under the Apache License, Version 2.0', +author='Flink Developers', +author_email='d...@flink.apache.org', +install_requires=['py4j==0.10.8.1'], +description='Flink Python API' Review comment: Fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support
WeiZhong94 commented on a change in pull request #8267: [FLINK-12311][table][python] Add base python framework and Add Scan, Projection, and Filter operator support URL: https://github.com/apache/flink/pull/8267#discussion_r279230110 ## File path: docs/dev/table/tableApi.md ## @@ -297,6 +297,81 @@ val result = orders.where('b === "red") + Review comment: I agree that a README.md file is necessary. I added it in the new commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12361) Remove useless expression from runtime scheduler
Liya Fan created FLINK-12361: Summary: Remove useless expression from runtime scheduler Key: FLINK-12361 URL: https://issues.apache.org/jira/browse/FLINK-12361 Project: Flink Issue Type: Improvement Components: Runtime / Operators Reporter: Liya Fan Assignee: Liya Fan Attachments: image-2019-04-29-11-16-13-492.png In the scheduleTask method of Scheduler class, expression forceExternalLocation is useless, since it always evaluates to false: !image-2019-04-29-11-16-13-492.png! So it can be removed. Moreover, by removing this expression, the code structure can be made much simpler, because there are some branches relying this expression, which can also be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] lamber-ken commented on a change in pull request #8268: [FLINK-12246][runtime] Fix can't get max_attempts_history_size value when create ExecutionVertex
lamber-ken commented on a change in pull request #8268: [FLINK-12246][runtime] Fix can't get max_attempts_history_size value when create ExecutionVertex URL: https://github.com/apache/flink/pull/8268#discussion_r279229762 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java ## @@ -816,6 +817,9 @@ public void attachJobGraph(List topologiallySorted) throws JobExcepti final ArrayList newExecJobVertices = new ArrayList<>(topologiallySorted.size()); final long createTimestamp = System.currentTimeMillis(); + int maxPriorAttemptsHistoryLength = jobInformation.getJobConfiguration().getInteger( Review comment: @tillrohrmann, hi, I have question that `ExecutionGraphBuilder.buildGraph` was called from `JobMaster`, but there is only one `jobMasterConfiguration` in JobMaster class. Can jobMasterConfiguration represent ClusterConfigration? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12360) Translate "Jobs and Scheduling" Page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armstrong Nova updated FLINK-12360: --- Description: Translate the internal page "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]"; to Chinese Translate "flink/docs/internals/job_scheduling.md" into "flink/docs/internals/job_scheduling.zh.md" was:Translate the internal page "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]"; to Chinese Summary: Translate "Jobs and Scheduling" Page into Chinese (was: Translate "Jobs and Scheduling" Page to Chinese) > Translate "Jobs and Scheduling" Page into Chinese > - > > Key: FLINK-12360 > URL: https://issues.apache.org/jira/browse/FLINK-12360 > Project: Flink > Issue Type: Task > Components: chinese-translation, Documentation >Reporter: Armstrong Nova >Assignee: Armstrong Nova >Priority: Major > > Translate the internal page > "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]"; > to Chinese > Translate "flink/docs/internals/job_scheduling.md" into > "flink/docs/internals/job_scheduling.zh.md" > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] wuchong commented on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime
wuchong commented on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime URL: https://github.com/apache/flink/pull/8302#issuecomment-487433247 Hi @godfreyhe , I add the `JoinPushExpressionsRule.INSTANCE` rule into predicate pushdown stage to make sure the calculation expression can be pushed down (to make sure `ON a+1 = b` is an equi join). That's why several results of plan test are changed. I have checked, the changes are reasonable. It would be nice if you can have a look too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] wuchong edited a comment on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime
wuchong edited a comment on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime URL: https://github.com/apache/flink/pull/8302#issuecomment-487433247 Hi @godfreyhe , I add the `JoinPushExpressionsRule.INSTANCE` rule into predicate pushdown stage to make sure the calculation expression can be pushed down (to make sure `ON mod(a, 4) = b` is an equi join). That's why several results of plan test are changed. I have checked, the changes are reasonable. It would be nice if you can have a look too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12360) Translate "Jobs and Scheduling" Page to Chinese
Armstrong Nova created FLINK-12360: -- Summary: Translate "Jobs and Scheduling" Page to Chinese Key: FLINK-12360 URL: https://issues.apache.org/jira/browse/FLINK-12360 Project: Flink Issue Type: Task Components: chinese-translation, Documentation Reporter: Armstrong Nova Assignee: Armstrong Nova Translate the internal page "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]"; to Chinese -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12360) Translate "Jobs and Scheduling" Page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armstrong Nova updated FLINK-12360: --- Description: Translate the internal page "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]"; to Chinese the doc locates in "flink/docs/internals/job_scheduling.md", the translated doc in "flink/docs/internals/job_scheduling.zh.md" was: Translate the internal page "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]"; to Chinese Translate "flink/docs/internals/job_scheduling.md" into "flink/docs/internals/job_scheduling.zh.md" > Translate "Jobs and Scheduling" Page into Chinese > - > > Key: FLINK-12360 > URL: https://issues.apache.org/jira/browse/FLINK-12360 > Project: Flink > Issue Type: Task > Components: chinese-translation, Documentation >Reporter: Armstrong Nova >Assignee: Armstrong Nova >Priority: Major > > Translate the internal page > "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]"; > to Chinese > the doc locates in "flink/docs/internals/job_scheduling.md", the translated > doc in "flink/docs/internals/job_scheduling.zh.md" > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] sunjincheng121 commented on a change in pull request #8230: [FLINK-10977][table] Add streaming non-window FlatAggregate to Table API
sunjincheng121 commented on a change in pull request #8230: [FLINK-10977][table] Add streaming non-window FlatAggregate to Table API URL: https://github.com/apache/flink/pull/8230#discussion_r279223257 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/expressions/AggregateFunctionDefinition.java ## @@ -31,13 +31,13 @@ @PublicEvolving public final class AggregateFunctionDefinition extends FunctionDefinition { - private final AggregateFunction aggregateFunction; + private final UserDefinedAggregateFunction aggregateFunction; private final TypeInformation resultTypeInfo; private final TypeInformation accumulatorTypeInfo; public AggregateFunctionDefinition( String name, - AggregateFunction aggregateFunction, + UserDefinedAggregateFunction aggregateFunction, Review comment: Yes, make sense to me if we can share more logic between Agg and Table Agg. and we do not need to add a new type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12351) AsyncWaitOperator should deep copy StreamElement when object reuse is enabled
[ https://issues.apache.org/jira/browse/FLINK-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828869#comment-16828869 ] Jark Wu commented on FLINK-12351: - Hi [~aitozi], I think fix the bug in AsyncWaitOperator and enable objectReuse on operator level are two orthogonal problems. We can create another JIRA to discuss the operator level object reuse problem. Currently, I only find the AsyncWaitOperator is affected, because it doesn't deep copy input record before put it into heap buffer (Java ArrayDeque). IMO, no matter object reuse is enabled or not, the AsyncWaitOperator should output the same result, because it's the framework code not user code. Hi [~till.rohrmann], what do you think about this? If you don't object, I can create a PR for this. > AsyncWaitOperator should deep copy StreamElement when object reuse is enabled > - > > Key: FLINK-12351 > URL: https://issues.apache.org/jira/browse/FLINK-12351 > Project: Flink > Issue Type: Bug >Reporter: Jark Wu >Priority: Major > Fix For: 1.9.0 > > > Currently, AsyncWaitOperator directly put the input StreamElement into > {{StreamElementQueue}}. But when object reuse is enabled, the StreamElement > is reused, which means the element in {{StreamElementQueue}} will be > modified. As a result, the output of AsyncWaitOperator might be wrong. > An easy way to fix this might be deep copy the input StreamElement when > object reuse is enabled, like this: > https://github.com/apache/flink/blob/blink/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/async/AsyncWaitOperator.java#L209 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot edited a comment on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module
flinkbot edited a comment on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module URL: https://github.com/apache/flink/pull/8205#issuecomment-484369983 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❗ 3. Needs [attention] from. - Needs attention by @kurtyoung, @zentol [PMC] * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module
bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module URL: https://github.com/apache/flink/pull/8205#issuecomment-487431946 Synced with @zentol offline. We cannot depend on flink-shaded-hadoop2 because it only publishes SNAPSHOT jars for hadoop 2.4, but not 2.6 and above. Thus we decided to depend on hadoop directly with "provided" scope. @KurtYoung would be great to have you take a final look and merge if everything looks good This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module
bowenli86 commented on issue #8205: [FLINK-12238] [hive] Support database related operations in GenericHiveMetastoreCatalog and setup flink-connector-hive module URL: https://github.com/apache/flink/pull/8205#issuecomment-487431970 @flinkbot attention @KurtYoung This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request
[ https://issues.apache.org/jira/browse/FLINK-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828843#comment-16828843 ] Chesnay Schepler edited comment on FLINK-12358 at 4/28/19 9:01 PM: --- I doubt that a diff-based approach will work here properly. For one there's a maintenance overhead, since every new relevant file (i.e. header) must be covered, but their location is arbitrary. You'd also have to check whether changes in the file actually change semantics; for example a javadoc change doesn't require regeneration. Finally, you wouldn't be able to determine whether changes to the REST API docs are in sync with other changes made. Given that we already have utilities for reading/writing Rest API snapshots (see RestAPIStabilityTest) and examples for parsing the existing html (see ConfigOptionsCompletenessTest), I'd be very much in favor of following these example.. was (Author: zentol): I doubt that a diff-based approach will work here properly. For one there's a maintenance overhead, since every new relevant file (i.e. header) must be covered, but their location is arbitrary. You'd also have to check whether changes in the file actually change semantics; for example a javadoc change doesn't require regeneration. Given that we already have utilities for reading/writing Rest API snapshots (see RestAPIStabilityTest) and examples for parsing the existing html (see ConfigOptionsCompletenessTest), I'd be very much in favor of a similar approach. > Verify whether rest documenation needs to be updated when building pull > request > --- > > Key: FLINK-12358 > URL: https://issues.apache.org/jira/browse/FLINK-12358 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Minor > > Currently, unlike configuration docs, rest-API docs have no any methods to > check whether updated to latest code. This is really annoying and not easy to > track if only checked by developers. > I plan to check this in travis to verify whether any files have been updated > by using `git status`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request
[ https://issues.apache.org/jira/browse/FLINK-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828843#comment-16828843 ] Chesnay Schepler commented on FLINK-12358: -- I doubt that a diff-based approach will work here properly. For one there's a maintenance overhead, since every new relevant file (i.e. header) must be covered, but their location is arbitrary. You'd also have to check whether changes in the file actually change semantics; for example a javadoc change doesn't require regeneration. Given that we already have utilities for reading/writing Rest API snapshots (see RestAPIStabilityTest) and examples for parsing the existing html (see ConfigOptionsCompletenessTest), I'd be very much in favor of a similar approach. > Verify whether rest documenation needs to be updated when building pull > request > --- > > Key: FLINK-12358 > URL: https://issues.apache.org/jira/browse/FLINK-12358 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Minor > > Currently, unlike configuration docs, rest-API docs have no any methods to > check whether updated to latest code. This is really annoying and not easy to > track if only checked by developers. > I plan to check this in travis to verify whether any files have been updated > by using `git status`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12175) TypeExtractor.getMapReturnTypes produces different TypeInformation than createTypeInformation for classes with parameterized ancestors
[ https://issues.apache.org/jira/browse/FLINK-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828822#comment-16828822 ] Andrey Bulgakov commented on FLINK-12175: - Yes I am working on this. I have found a solution and working on pull request. > TypeExtractor.getMapReturnTypes produces different TypeInformation than > createTypeInformation for classes with parameterized ancestors > -- > > Key: FLINK-12175 > URL: https://issues.apache.org/jira/browse/FLINK-12175 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.7.2, 1.9.0 >Reporter: Dylan Adams >Priority: Major > > I expect that the {{TypeMapper}} {{createTypeInformation}} and > {{getMapReturnTypes}} would produce equivalent type information for the same > type. But when there is a parameterized superclass, this does not appear to > be the case. > Here's a test case that could be added to {{PojoTypeExtractorTest.java}} that > demonstrates the issue: > {code} > public static class Pojo implements Serializable { > public int digits; > public String letters; > } > public static class ParameterizedParent implements > Serializable { > public T pojoField; > } > public static class ConcreteImpl extends ParameterizedParent { > public double precise; > } > public static class ConcreteMapFunction implements MapFunction ConcreteImpl> { > @Override > public ConcreteImpl map(ConcreteImpl value) throws Exception { > return null; > } > } > @Test > public void testMapReturnType() { > final TypeInformation directTypeInfo = > TypeExtractor.createTypeInfo(ConcreteImpl.class); > Assert.assertTrue(directTypeInfo instanceof PojoTypeInfo); > TypeInformation directPojoFieldTypeInfo = ((PojoTypeInfo) > directTypeInfo).getPojoFieldAt(0).getTypeInformation(); > Assert.assertTrue(directPojoFieldTypeInfo instanceof PojoTypeInfo); > final TypeInformation mapReturnTypeInfo > = TypeExtractor.getMapReturnTypes(new ConcreteMapFunction(), > directTypeInfo); > Assert.assertTrue(mapReturnTypeInfo instanceof PojoTypeInfo); > TypeInformation mapReturnPojoFieldTypeInfo = ((PojoTypeInfo) > mapReturnTypeInfo).getPojoFieldAt(0).getTypeInformation(); > Assert.assertTrue(mapReturnPojoFieldTypeInfo instanceof PojoTypeInfo); > Assert.assertEquals(directTypeInfo, mapReturnTypeInfo); > } > {code} > This test case will fail on the last two asserts because > {{getMapReturnTypes}} produces a {{TypeInformation}} for {{ConcreteImpl}} > with a {{GenericTypeInfo}} for the {{pojoField}}, whereas > {{createTypeInformation}} correctly produces a {{PojoTypeInfo}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828776#comment-16828776 ] lamber-ken commented on FLINK-12183: (y) > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will > exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. > It's hard to find that because here it only catches IOException. In > SubtaskExecutionAttemptDetailsHandler and > SubtaskExecutionAttemptAccumulatorsHandler, when calling > archiveJsonWithPath() method, it will construct some json information about > prior execution attempts but the index is from 0 which might be dropped index > for the for loop. In default, it will return null when trying to get the > prior execution attempt (AccessExecution attempt = > subtask.getPriorExecutionAttempt(x)). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8307: [FLINK-12359][metrics][tests] Harden SystemResourcesMetricsITCase
flinkbot commented on issue #8307: [FLINK-12359][metrics][tests] Harden SystemResourcesMetricsITCase URL: https://github.com/apache/flink/pull/8307#issuecomment-487392775 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12359) SystemResourcesMetricsITCase unstable
[ https://issues.apache.org/jira/browse/FLINK-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12359: --- Labels: pull-request-available (was: ) > SystemResourcesMetricsITCase unstable > - > > Key: FLINK-12359 > URL: https://issues.apache.org/jira/browse/FLINK-12359 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics, Tests >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > The {{SystemResourcesMetricsITCase}} checks that task managers register > specific set of metrics if configured to do so. The test assumes that the TM > is already started completely when the test starts, but this may not be the > case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol opened a new pull request #8307: [FLINK-12359][metrics][tests] Harden SystemResourcesMetricsITCase
zentol opened a new pull request #8307: [FLINK-12359][metrics][tests] Harden SystemResourcesMetricsITCase URL: https://github.com/apache/flink/pull/8307 ## What is the purpose of the change Hardens the `SystemResourcesMetricsITCase` against * slow starts of a TaskManager, in which case metrics could be registered later than the test expects * the host portion of the metric identifier not being `localhost` We now exclude the host from the identifier, and generate a future in the reporter that we wait on instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (FLINK-12139) Flink on mesos - Parameterize disk space needed.
[ https://issues.apache.org/jira/browse/FLINK-12139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-12139: - Assignee: Juan (was: Oleksandr Nitavskyi) > Flink on mesos - Parameterize disk space needed. > > > Key: FLINK-12139 > URL: https://issues.apache.org/jira/browse/FLINK-12139 > Project: Flink > Issue Type: Improvement > Components: Deployment / Mesos >Reporter: Juan >Assignee: Juan >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We are having a small issue while trying to deploy Flink on Mesos using > marathon. In our set up of Mesos we are required to specify the amount of > disk space we want to have for the applications we deploy there. > The current default value in Flink is 0 and it's currently is not > parameterizable. This means that we ask 0 disk space for our instances so > Flink can't work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-12139) Flink on mesos - Parameterize disk space needed.
[ https://issues.apache.org/jira/browse/FLINK-12139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-12139. --- Resolution: Fixed Fix Version/s: 1.9.0 Fixed via cf10e70088645b0fcc54ab03ecec3ef372ddb652 > Flink on mesos - Parameterize disk space needed. > > > Key: FLINK-12139 > URL: https://issues.apache.org/jira/browse/FLINK-12139 > Project: Flink > Issue Type: Improvement > Components: Deployment / Mesos >Reporter: Juan >Assignee: Juan >Priority: Minor > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We are having a small issue while trying to deploy Flink on Mesos using > marathon. In our set up of Mesos we are required to specify the amount of > disk space we want to have for the applications we deploy there. > The current default value in Flink is 0 and it's currently is not > parameterizable. This means that we ask 0 disk space for our instances so > Flink can't work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] tillrohrmann closed pull request #8224: [FLINK-12139][Mesos] Add disk space parameter.
tillrohrmann closed pull request #8224: [FLINK-12139][Mesos] Add disk space parameter. URL: https://github.com/apache/flink/pull/8224 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers
[ https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12342: --- Labels: pull-request-available (was: ) > Yarn Resource Manager Acquires Too Many Containers > -- > > Key: FLINK-12342 > URL: https://issues.apache.org/jira/browse/FLINK-12342 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Environment: We runs job in Flink release 1.6.3. >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Critical > Labels: pull-request-available > > In currently implementation of YarnFlinkResourceManager, it starts to acquire > new container one by one when get request from SlotManager. The mechanism > works when job is still, say less than 32 containers. If the job has 256 > container, containers can't be immediately allocated and appending requests > in AMRMClient will be not removed accordingly. We observe the situation that > AMRMClient ask for current pending request + 1 (the new request from slot > manager) containers. In this way, during the start time of such job, it asked > for 4000+ containers. If there is an external dependency issue happens, for > example hdfs access is slow. Then, the whole job will be blocked without > getting enough resource and finally killed with SlotManager request timeout. > Thus, we should use the total number of container asked rather than pending > request in AMRMClient as threshold to make decision whether we need to add > one more resource request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] HuangZhenQiu opened a new pull request #8306: [FLINK-12342] Use total requested pending containers to bound the maximum containers to…
HuangZhenQiu opened a new pull request #8306: [FLINK-12342] Use total requested pending containers to bound the maximum containers to… URL: https://github.com/apache/flink/pull/8306 ## What is the purpose of the change Fix the issue of requesting too many containers issue in FlinkResourceManager. Original the new request issue is bounded by the number of pending containers in AMRMClientAsync. As every time AMRMClientAsync will issue request of pendingContainers + 1 containers, it always ask more than we need. Especially when an Flink application needs 100+ containers, the number total containers will be in thousands. The issue will be enlarged when we do cluster maintenance and restart 1000+ applications as soon as possible. Too many request will overload yarn resource manager and introduce extra delay for restarting applications. ## Brief change log - record the total number of containers are requested and still in pending status - use the totalRequestedPendingContainers as upper bound fore new container acquirement ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8306: [FLINK-12342] Use total requested pending containers to bound the maximum containers to…
flinkbot commented on issue #8306: [FLINK-12342] Use total requested pending containers to bound the maximum containers to… URL: https://github.com/apache/flink/pull/8306#issuecomment-487389322 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations
[ https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16828040#comment-16828040 ] Gyula Fora commented on FLINK-12141: [~yangfei] I havent started any work on this, so if you are interested please go ahead :) > Allow @TypeInfo annotation on POJO field declarations > - > > Key: FLINK-12141 > URL: https://issues.apache.org/jira/browse/FLINK-12141 > Project: Flink > Issue Type: New Feature > Components: API / Type Serialization System >Reporter: Gyula Fora >Priority: Minor > Labels: starter > > The TypeInfo annotation is a great way to declare serializers for custom > types however I feel that it's usage is limited by the fact that it can only > be used on types that are declared in the project. > By allowing the annotation to be used on field declarations we could improve > the TypeExtractor logic to use the type factory when creating the > PojoTypeInformation. > This would be a big improvement as in many cases classes from other libraries > or collection types are used within custom Pojo classes and Flink would > default to Kryo serialization which would hurt performance and cause problems > later. > The current workaround in these cases is to implement a custom serializer for > the entire pojo which is a waste of effort when only a few fields might > require custom serialization logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations
[ https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora reassigned FLINK-12141: -- Assignee: YangFei > Allow @TypeInfo annotation on POJO field declarations > - > > Key: FLINK-12141 > URL: https://issues.apache.org/jira/browse/FLINK-12141 > Project: Flink > Issue Type: New Feature > Components: API / Type Serialization System >Reporter: Gyula Fora >Assignee: YangFei >Priority: Minor > Labels: starter > > The TypeInfo annotation is a great way to declare serializers for custom > types however I feel that it's usage is limited by the fact that it can only > be used on types that are declared in the project. > By allowing the annotation to be used on field declarations we could improve > the TypeExtractor logic to use the type factory when creating the > PojoTypeInformation. > This would be a big improvement as in many cases classes from other libraries > or collection types are used within custom Pojo classes and Flink would > default to Kryo serialization which would hurt performance and cause problems > later. > The current workaround in these cases is to implement a custom serializer for > the entire pojo which is a waste of effort when only a few fields might > require custom serialization logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8305: [hotfix] regenerate rest-docs to latest code
flinkbot commented on issue #8305: [hotfix] regenerate rest-docs to latest code URL: https://github.com/apache/flink/pull/8305#issuecomment-487389173 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12359) SystemResourcesMetricsITCase unstable
Chesnay Schepler created FLINK-12359: Summary: SystemResourcesMetricsITCase unstable Key: FLINK-12359 URL: https://issues.apache.org/jira/browse/FLINK-12359 Project: Flink Issue Type: Bug Components: Runtime / Metrics, Tests Affects Versions: 1.9.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.9.0 The {{SystemResourcesMetricsITCase}} checks that task managers register specific set of metrics if configured to do so. The test assumes that the TM is already started completely when the test starts, but this may not be the case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] Myasuka opened a new pull request #8305: [hotfix] regenerate rest-docs to latest code
Myasuka opened a new pull request #8305: [hotfix] regenerate rest-docs to latest code URL: https://github.com/apache/flink/pull/8305 ## What is the purpose of the change Regenerate rest-docs to latest code. I happen to notice this when I plan to implement [FLINK-12333](https://issues.apache.org/jira/browse/FLINK-12333). To avoid such hot-fix in the feature, I create an issue [FLINK-12358](https://issues.apache.org/jira/browse/FLINK-12358) to track this problem. ## Brief change log Regenerate rest-docs to latest code. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **not applicable** This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request
Yun Tang created FLINK-12358: Summary: Verify whether rest documenation needs to be updated when building pull request Key: FLINK-12358 URL: https://issues.apache.org/jira/browse/FLINK-12358 Project: Flink Issue Type: Improvement Components: Build System Reporter: Yun Tang Assignee: Yun Tang Currently, unlike configuration docs, rest-API docs have no any methods to check whether updated to latest code. This is really annoying and not easy to track if only checked by developers. I plan to check this in travis to verify whether any files have been updated by using `git status`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files
flinkbot commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files URL: https://github.com/apache/flink/pull/8304#issuecomment-487386856 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] Aitozi commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files
Aitozi commented on issue #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files URL: https://github.com/apache/flink/pull/8304#issuecomment-487386975 And I see there are quite a lot other tests rely on `deleteOnExit` to do clean up. Should I do the fix for them, too? Please take a look when you free , Thanks @zentol . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] Aitozi opened a new pull request #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files
Aitozi opened a new pull request #8304: [hotfix]Harden TextInputFormatTest to avoid failure by uncleaned files URL: https://github.com/apache/flink/pull/8304 ## What is the purpose of the change When I run test locally, it happened to failed and with the uncleaned temp file leaving in the tmp dir. So the test case can not pass ever until i delete the correspond files. It use `deleteOnExit` to clean up the temp file created by test case, But the method may not work when JVM crash or killed. I think it may make the test case fragile. ## Brief change log Add the @Before and @After prepare / cleanup method This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] klion26 commented on issue #8164: [FLINK-12184] history server compatible with old version
klion26 commented on issue #8164: [FLINK-12184] history server compatible with old version URL: https://github.com/apache/flink/pull/8164#issuecomment-487385580 @yumengz5 thanks for your reply, I'll file a new pr for this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] HuangZhenQiu commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration
HuangZhenQiu commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration URL: https://github.com/apache/flink/pull/8303#issuecomment-487385194 @rmetzger @tillrohrmann Please have a look this PR when you have time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12343) Allow set file.replication in Yarn Configuration
[ https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12343: --- Labels: pull-request-available (was: ) > Allow set file.replication in Yarn Configuration > > > Key: FLINK-12343 > URL: https://issues.apache.org/jira/browse/FLINK-12343 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Minor > Labels: pull-request-available > > Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 > replications. From our production experience, we find that 3 replications > will block big job (256 containers) to launch, when the HDFS is slow due to > big workload for batch pipelines. Thus, we want to make the factor > customizable from FlinkYarnSessionCli by adding an option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration
flinkbot commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration URL: https://github.com/apache/flink/pull/8303#issuecomment-487385120 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] HuangZhenQiu opened a new pull request #8303: [FLINK-12343]add file replication config for yarn configuration
HuangZhenQiu opened a new pull request #8303: [FLINK-12343]add file replication config for yarn configuration URL: https://github.com/apache/flink/pull/8303 ## What is the purpose of the change This pull request add file replication config for Flink yarn configuration. This is helpful to accelerate the bootstrap time of containers when Flink applications needs 100+ containers by changing the default replication number (such 3 for hdfs). ## Brief change log - Add a file-replication in YarnConfigOptions - Apply the file-replication value for each of application master local resources - Apply the default file replication value 3 for task manager local resources ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (docs) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12343) Allow set file.replication in Yarn Configuration
[ https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenqiu Huang updated FLINK-12343: -- Summary: Allow set file.replication in Yarn Configuration (was: Allow set hdfs.replication in Yarn Configuration) > Allow set file.replication in Yarn Configuration > > > Key: FLINK-12343 > URL: https://issues.apache.org/jira/browse/FLINK-12343 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Minor > > Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 > replications. From our production experience, we find that 3 replications > will block big job (256 containers) to launch, when the HDFS is slow due to > big workload for batch pipelines. Thus, we want to make the factor > customizable from FlinkYarnSessionCli by adding an option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime
flinkbot commented on issue #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime URL: https://github.com/apache/flink/pull/8302#issuecomment-487382537 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12269) Support Temporal Table Join in blink planner
[ https://issues.apache.org/jira/browse/FLINK-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12269: --- Labels: pull-request-available (was: ) > Support Temporal Table Join in blink planner > > > Key: FLINK-12269 > URL: https://issues.apache.org/jira/browse/FLINK-12269 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Major > Labels: pull-request-available > > Support translate following "FOR SYSTEM_TIME AS OF" query into > {{StreamExecTemporalTableJoin}}. > {code:sql} > SELECT > o.amout, o.currency, r.rate, o.amount * r.rate > FROM > Orders AS o > JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r > ON r.currency = o.currency > {code} > This is an extension to current temporal join (FLINK-9738) using a standard > syntax introduced in Calcite 1.19. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] wuchong opened a new pull request #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime
wuchong opened a new pull request #8302: [FLINK-12269][table-blink] Support Temporal Table Join in blink planner and runtime URL: https://github.com/apache/flink/pull/8302 ## What is the purpose of the change Support translate "FOR SYSTEM_TIME AS OF" query into temporal table join for both Batch and Stream. ## Brief change log - Introduce some API interface to support lookup table source - `LookupableTableSource`, `AsyncTableFunction`, `DefinedPrimaryKey`, `DefinedTableIndex`, `TableIndex`, `LookupConfig` - Support for generating optimized logical & physical plan for temporal table join for STREAM and BATCH. - Introduce temporal table join operators and async join operators for runtime. Some differences between this PR and blink branch. - Support returns multiple lines for async temporal table join - Only allow `FOR SYSTEM_TIME AS OF` on left table's proctime field, not a `PROCTIME()` builtin function. This makes syntax clean. - Lookup function support external types (i.e. `String`, `Timestamp`...). ## Verifying this change - Add several validation tests, plan tests , and integration tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12343) Allow set hdfs.replication in Yarn Configuration
[ https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827997#comment-16827997 ] Zhenqiu Huang commented on FLINK-12343: --- It can be set by in both way. Here is to set the bigger number for yarn applications with container requirement more than 128 containers to accelerate the bootstrap of container initialization. If changing the replication factor for a directory will only affect the existing files and the new files under the directory will get created with the default replication factor (dfs.replication from hdfs-site.xml) of the cluster. As we don't want to apply bigger replication number for any of other files in hdfs, I think we have to set from client side. How do you think? > Allow set hdfs.replication in Yarn Configuration > > > Key: FLINK-12343 > URL: https://issues.apache.org/jira/browse/FLINK-12343 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Minor > > Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 > replications. From our production experience, we find that 3 replications > will block big job (256 containers) to launch, when the HDFS is slow due to > big workload for batch pipelines. Thus, we want to make the factor > customizable from FlinkYarnSessionCli by adding an option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers
[ https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827993#comment-16827993 ] Zhenqiu Huang commented on FLINK-12342: --- [~till.rohrmann] The job that has the issue tries to acquire 256 containers. At the same time it starts, hdfs that serves the jar is very relatively slow. You can image the sum of 1 + 2 + 3 + T. It stops to acquire new when 256 - T < the appending request in AMRMClientAsync. > Yarn Resource Manager Acquires Too Many Containers > -- > > Key: FLINK-12342 > URL: https://issues.apache.org/jira/browse/FLINK-12342 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Environment: We runs job in Flink release 1.6.3. >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Critical > > In currently implementation of YarnFlinkResourceManager, it starts to acquire > new container one by one when get request from SlotManager. The mechanism > works when job is still, say less than 32 containers. If the job has 256 > container, containers can't be immediately allocated and appending requests > in AMRMClient will be not removed accordingly. We observe the situation that > AMRMClient ask for current pending request + 1 (the new request from slot > manager) containers. In this way, during the start time of such job, it asked > for 4000+ containers. If there is an external dependency issue happens, for > example hdfs access is slow. Then, the whole job will be blocked without > getting enough resource and finally killed with SlotManager request timeout. > Thus, we should use the total number of container asked rather than pending > request in AMRMClient as threshold to make decision whether we need to add > one more resource request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-12307) Support translation from StreamExecWindowJoin to StreamTransformation.
[ https://issues.apache.org/jira/browse/FLINK-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Young closed FLINK-12307. -- Resolution: Fixed Fix Version/s: 1.9.0 merged in 1.9.0: 177e7ff1291f397294f7b7dab1d791a07adc6a5a > Support translation from StreamExecWindowJoin to StreamTransformation. > -- > > Key: FLINK-12307 > URL: https://issues.apache.org/jira/browse/FLINK-12307 > Project: Flink > Issue Type: Task > Components: Table SQL / API >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] KurtYoung merged pull request #8261: [FLINK-12307][table-planner-blink] Support translation from StreamExecWindowJoin to StreamTransformation.
KurtYoung merged pull request #8261: [FLINK-12307][table-planner-blink] Support translation from StreamExecWindowJoin to StreamTransformation. URL: https://github.com/apache/flink/pull/8261 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827982#comment-16827982 ] Yumeng Zhang commented on FLINK-12183: -- lamber-ken, I agree with you. It's not a good idea to break the interface. But like I said just skip null values looks like a little weird. It seems we don't know why there are null values. For the attempt times, in general, we cannot let the streaming job attempt many times. But think about this case, if our streaming job hits a bad record that it cannot handle ,and we have enabled the checkpoint and not configured restart strategy, it will try to recover again and again until we find that. The number of attempts depends on how soon we find this. We cannot guarantee the number of attempts is just several times. Maybe it's efficient to judge the elements one by one in EvictingBoundedList. But I think the best way to solve this null pointer problem is to not break the interface and meanwhile get the right start index. > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will > exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. > It's hard to find that because here it only catches IOException. In > SubtaskExecutionAttemptDetailsHandler and > SubtaskExecutionAttemptAccumulatorsHandler, when calling > archiveJsonWithPath() method, it will construct some json information about > prior execution attempts but the index is from 0 which might be dropped index > for the for loop. In default, it will return null when trying to get the > prior execution attempt (AccessExecution attempt = > subtask.getPriorExecutionAttempt(x)). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig
flinkbot commented on issue #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig URL: https://github.com/apache/flink/pull/8301#issuecomment-487373286 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12357) Remove useless code in TableConfig
[ https://issues.apache.org/jira/browse/FLINK-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12357: --- Labels: pull-request-available (was: ) > Remove useless code in TableConfig > -- > > Key: FLINK-12357 > URL: https://issues.apache.org/jira/browse/FLINK-12357 > Project: Flink > Issue Type: Bug > Components: Table SQL / API >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] hequn8128 opened a new pull request #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig
hequn8128 opened a new pull request #8301: [FLINK-12357][api-java][hotfix] Remove useless code in TableConfig URL: https://github.com/apache/flink/pull/8301 ## What is the purpose of the change Remove useless code in TableConfig. It was added accidentally in [FLINK-11067](https://issues.apache.org/jira/browse/FLINK-11067). ## Brief change log - Remove `userDefinedConfig` in `TableConfig` ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] yumengz5 commented on issue #8164: [FLINK-12184] history server compatible with old version
yumengz5 commented on issue #8164: [FLINK-12184] history server compatible with old version URL: https://github.com/apache/flink/pull/8164#issuecomment-487372370 @klion26 Sorry about the late reply. All right. You can take over it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12343) Allow set hdfs.replication in Yarn Configuration
[ https://issues.apache.org/jira/browse/FLINK-12343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827957#comment-16827957 ] Till Rohrmann commented on FLINK-12343: --- Isn't the replication factor an HDFS configuration you need to specify when starting the HDFS cluster or can Yarn applications specify this on their own? > Allow set hdfs.replication in Yarn Configuration > > > Key: FLINK-12343 > URL: https://issues.apache.org/jira/browse/FLINK-12343 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Minor > > Currently, FlinkYarnSessionCli upload jars into hdfs with default 3 > replications. From our production experience, we find that 3 replications > will block big job (256 containers) to launch, when the HDFS is slow due to > big workload for batch pipelines. Thus, we want to make the factor > customizable from FlinkYarnSessionCli by adding an option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12342) Yarn Resource Manager Acquires Too Many Containers
[ https://issues.apache.org/jira/browse/FLINK-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827956#comment-16827956 ] Till Rohrmann commented on FLINK-12342: --- Thanks for reporting this issue [~hpeter]. For my understanding, will the previous container requests be added up with every call to {{AMRMClientAsync#addContainerRequest}} to the total number of container requests? So to say, the system would request 1 + 2 + 3 + 4 + 5 + ... + n if calling n times {{addContainerRequest}} and if no container request could be fulfilled in the meantime? Or how does it happen that we add 4000+ container requests to the {{AMRMClient}}? > Yarn Resource Manager Acquires Too Many Containers > -- > > Key: FLINK-12342 > URL: https://issues.apache.org/jira/browse/FLINK-12342 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN >Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Environment: We runs job in Flink release 1.6.3. >Reporter: Zhenqiu Huang >Assignee: Zhenqiu Huang >Priority: Critical > > In currently implementation of YarnFlinkResourceManager, it starts to acquire > new container one by one when get request from SlotManager. The mechanism > works when job is still, say less than 32 containers. If the job has 256 > container, containers can't be immediately allocated and appending requests > in AMRMClient will be not removed accordingly. We observe the situation that > AMRMClient ask for current pending request + 1 (the new request from slot > manager) containers. In this way, during the start time of such job, it asked > for 4000+ containers. If there is an external dependency issue happens, for > example hdfs access is slow. Then, the whole job will be blocked without > getting enough resource and finally killed with SlotManager request timeout. > Thus, we should use the total number of container asked rather than pending > request in AMRMClient as threshold to make decision whether we need to add > one more resource request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] link3280 closed pull request #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)
link3280 closed pull request #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber) URL: https://github.com/apache/flink/pull/8299 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-12346) Scala-suffix check broken on Travis
[ https://issues.apache.org/jira/browse/FLINK-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-12346. Resolution: Fixed master: 8f02746ec7316ccdc6d25e915f3b58595438cc44 > Scala-suffix check broken on Travis > --- > > Key: FLINK-12346 > URL: https://issues.apache.org/jira/browse/FLINK-12346 > Project: Flink > Issue Type: Bug > Components: Build System, Travis >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > the scala-suffix check currently does not work on travis since the maven > output is not what the script expects. On travis we have timestamps in the > maven output, which breaks the parsing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol merged pull request #8289: [FLINK-12346][travis][build] Account for timestampts in scala-suffix check
zentol merged pull request #8289: [FLINK-12346][travis][build] Account for timestampts in scala-suffix check URL: https://github.com/apache/flink/pull/8289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12260) Slot allocation failure by taskmanager registration timeout and race
[ https://issues.apache.org/jira/browse/FLINK-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827919#comment-16827919 ] Xiaolei Jia commented on FLINK-12260: - Hi, If this issue is still open, I'd like to fix this problem with an attempt count, as [~till.rohrmann] suggested. Thanks. > Slot allocation failure by taskmanager registration timeout and race > > > Key: FLINK-12260 > URL: https://issues.apache.org/jira/browse/FLINK-12260 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.6.3 >Reporter: Hwanju Kim >Priority: Critical > Attachments: FLINK-12260-repro.diff > > > > In 1.6.2., we have seen slot allocation failure keep happening for long time. > Having looked at the log, I see the following behavior: > # TM sends a registration request R1 to resource manager. > # R1 times out after 100ms, which is initial timeout. > # TM retries a registration request R2 to resource manager (with timeout > 200ms). > # R2 arrives first at resource manager and registered, and then TM gets > successful response moving onto step 5 below. > # On successful registration, R2's instance is put to > taskManagerRegistrations > # Then R1 arrives at resource manager and realizes the same TM resource ID > is already registered, which then unregisters R2's instance ID from > taskManagerRegistrations. A new instance ID for R1 is registered to > workerRegistration. > # R1's response is not handled though since it already timed out (see akka > temp actor resolve failure below), hence no registration to > taskManagerRegistrations. > # TM keeps heartbeating to the resource manager with slot status. > # Resource manager ignores this slot status, since taskManagerRegistrations > contains R2, not R1, which replaced R2 in workerRegistration at step 6. > # Slot request can never be fulfilled, timing out. > The following is the debug logs for the above steps: > > {code:java} > JM log: > 2019-04-11 22:39:40.000,Registering TaskManager with ResourceID > 46c8e0d0fcf2c306f11954a1040d5677 > (akka.ssl.tcp://flink@flink-taskmanager:6122/user/taskmanager_0) at > ResourceManager > 2019-04-11 22:39:40.000,Registering TaskManager > 46c8e0d0fcf2c306f11954a1040d5677 under deade132e2c41c52019cdc27977266cf at > the SlotManager. > 2019-04-11 22:39:40.000,Replacing old registration of TaskExecutor > 46c8e0d0fcf2c306f11954a1040d5677. > 2019-04-11 22:39:40.000,Unregister TaskManager > deade132e2c41c52019cdc27977266cf from the SlotManager. > 2019-04-11 22:39:40.000,Registering TaskManager with ResourceID > 46c8e0d0fcf2c306f11954a1040d5677 > (akka.ssl.tcp://flink@flink-taskmanager:6122/user/taskmanager_0) at > ResourceManager > TM log: > 2019-04-11 22:39:40.000,Registration at ResourceManager attempt 1 > (timeout=100ms) > 2019-04-11 22:39:40.000,Registration at ResourceManager > (akka.ssl.tcp://flink@flink-jobmanager:6123/user/resourcemanager) attempt 1 > timed out after 100 ms > 2019-04-11 22:39:40.000,Registration at ResourceManager attempt 2 > (timeout=200ms) > 2019-04-11 22:39:40.000,Successful registration at resource manager > akka.ssl.tcp://flink@flink-jobmanager:6123/user/resourcemanager under > registration id deade132e2c41c52019cdc27977266cf. > 2019-04-11 22:39:41.000,resolve of path sequence [/temp/$c] failed{code} > > As RPC calls seem to use akka ask, which creates temporary source actor, I > think multiple RPC calls could've arrived out or order by different actor > pairs and the symptom above seems to be due to that. If so, it could have > attempt account in the call argument to prevent unexpected unregistration? At > this point, what I have done is only log analysis, so I could do further > analysis, but before that wanted to check if it's a known issue. I also > searched with some relevant terms and log pieces, but couldn't find the > duplicate. Please deduplicate if any. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10052) Tolerate temporarily suspended ZooKeeper connections
[ https://issues.apache.org/jira/browse/FLINK-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827912#comment-16827912 ] Dominik Wosiński commented on FLINK-10052: -- Hey, Sorry, somehow I have missed the issue. I will fix it ASAP. > Tolerate temporarily suspended ZooKeeper connections > > > Key: FLINK-10052 > URL: https://issues.apache.org/jira/browse/FLINK-10052 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.4.2, 1.5.2, 1.6.0 >Reporter: Till Rohrmann >Assignee: Dominik Wosiński >Priority: Major > > This issue results from FLINK-10011 which uncovered a problem with Flink's HA > recovery and proposed the following solution to harden Flink: > The {{ZooKeeperLeaderElectionService}} uses the {{LeaderLatch}} Curator > recipe for leader election. The leader latch revokes leadership in case of a > suspended ZooKeeper connection. This can be premature in case that the system > can reconnect to ZooKeeper before its session expires. The effect of the lost > leadership is that all jobs will be canceled and directly restarted after > regaining the leadership. > Instead of directly revoking the leadership upon a SUSPENDED ZooKeeper > connection, it would be better to wait until the ZooKeeper connection is > LOST. That way we would allow the system to reconnect and not lose the > leadership. This could be achievable by using Curator's {{LeaderSelector}} > instead of the {{LeaderLatch}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12219) Yarn application can't stop when flink job failed in per-job yarn cluste mode
[ https://issues.apache.org/jira/browse/FLINK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824919#comment-16824919 ] lamber-ken edited comment on FLINK-12219 at 4/28/19 10:41 AM: -- no, in detach mode. was (Author: lamber-ken): right. > Yarn application can't stop when flink job failed in per-job yarn cluste mode > - > > Key: FLINK-12219 > URL: https://issues.apache.org/jira/browse/FLINK-12219 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / REST >Affects Versions: 1.6.3, 1.8.0 >Reporter: lamber-ken >Assignee: lamber-ken >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: fix-bug.patch, image-2019-04-17-15-00-40-687.png, > image-2019-04-17-15-02-49-513.png, image-2019-04-23-17-37-00-081.png > > Time Spent: 10m > Remaining Estimate: 0h > > h3. *Issue detail info* > In our flink(1.6.3) product env, I often encounter a scene that yarn > application can't stop when flink job failed in per-job yarn cluste mode, so > I deeply analyzed the reason why it happened. > When a flink job fail, system will write an archive file to a FileSystem > through +MiniDispatcher#archiveExecutionGraph+ method, then notify > YarnJobClusterEntrypoint to shutDown. But, if > +MiniDispatcher#archiveExecutionGraph+ throw exceptions during execution, it > affect the following calls. > So I open > [FLINK-12247|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-12247] > to solve NEP bug when system write archive to FileSystem. But We still need > to consider other exceptions, so we should catch Exception / Throwable not > just IOExcetion. > h3. *Flink yarn job fail flow* > !image-2019-04-23-17-37-00-081.png! > h3. *Flink yarn job fail on yarn* > !image-2019-04-17-15-00-40-687.png! > > h3. *Flink yarn application can't stop* > !image-2019-04-17-15-02-49-513.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901 ] lamber-ken edited comment on FLINK-12183 at 4/28/19 10:33 AM: -- [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits also. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.2.1 version to flink-1.6.3 version. As you says, it's hard to find and incidental, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient that judge the element exists or not just by index. Generally, we just allow streaming job only attempt several times, not up to 1000 times. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} was (Author: lamber-ken): [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits also. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient that judge the element exists or not just by index. Generally, we just allow streaming job only attempt several times, not up to 1000 times. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationF
[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901 ] lamber-ken edited comment on FLINK-12183 at 4/28/19 10:29 AM: -- [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits also. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient that judge the element exists or not just by index. Generally, we just allow streaming job only attempt several times, not up to 1000 times. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} was (Author: lamber-ken): [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits too. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient that judge the element exists or not just by index. Generally, we just allow streaming job only attempt several times, not up to 1000 times. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The Com
[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901 ] lamber-ken edited comment on FLINK-12183 at 4/28/19 10:28 AM: -- [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits too. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so it's very efficient that judge the element exists or not just by index. Generally, we just allow streaming job only attempt several times, not up to 1000 times. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} was (Author: lamber-ken): [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits too. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so judge the element exists or not just by index. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will > exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. > It's
[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901 ] lamber-ken edited comment on FLINK-12183 at 4/28/19 10:20 AM: -- [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits too. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so judge the element exists or not just by index. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} was (Author: lamber-ken): [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits too. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so judge the element exists or not just by index. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block{color:#660e7a} {color} {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will > exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. > It's hard to find that because here it only catches IOException. In > SubtaskEx
[jira] [Commented] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827901#comment-16827901 ] lamber-ken commented on FLINK-12183: [~Yumeng], hi, What a coincidence! We met the same problem. I'm sorry to create a duplicate issue. When I create FLINK-12247, I just check the lastest SubtaskExecutionAttemptDetailsHandler and SubtaskExecutionAttemptAccumulatorsHandler in github's master branch, and found this problem exits too. *First,* This problem has been bothering us for a long time. This problem has always existed from flink-1.3.2 version to flink-1.6.3 version. As you says, it's hard to find, so I also use uml to describe the flow https://issues.apache.org/jira/browse/FLINK-12219. *Second,* At the beginning, I use a way to solve this problem like your patch. But I think it's not so well, because it breaks the interface and [ExecutionVertex.java|https://github.com/apache/flink/pull/8163/files#diff-52349a7928cbb1217a0704390cedbee3] no need to implement it. BTW, {color:#660e7a}priorExecutions {color}{color:#33}is a {color}EvictingBoundedList, so judge the element exists or not just by index. So just skip null value may be more appropriate from my side. {color:#33}*Third,* to{color} prevention unexpected RuntimeException, we should move `jobTerminationFuture.complete` to a finally block{color:#660e7a} {color} {code:java} protected void jobReachedGloballyTerminalState(ArchivedExecutionGraph archivedExecutionGraph) { try { super.jobReachedGloballyTerminalState(archivedExecutionGraph); } catch (Exception e) { log.error("jobReachedGloballyTerminalState exception", e); } finally { if (executionMode == ClusterEntrypoint.ExecutionMode.DETACHED) { // shut down since we don't have to wait for the execution result retrieval jobTerminationFuture.complete(ApplicationStatus.fromJobStatus(archivedExecutionGraph.getState())); } } } {code} > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will > exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. > It's hard to find that because here it only catches IOException. In > SubtaskExecutionAttemptDetailsHandler and > SubtaskExecutionAttemptAccumulatorsHandler, when calling > archiveJsonWithPath() method, it will construct some json information about > prior execution attempts but the index is from 0 which might be dropped index > for the for loop. In default, it will return null when trying to get the > prior execution attempt (AccessExecution attempt = > subtask.getPriorExecutionAttempt(x)). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12357) Remove useless code in TableConfig
Hequn Cheng created FLINK-12357: --- Summary: Remove useless code in TableConfig Key: FLINK-12357 URL: https://issues.apache.org/jira/browse/FLINK-12357 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Hequn Cheng Assignee: Hequn Cheng -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12175) TypeExtractor.getMapReturnTypes produces different TypeInformation than createTypeInformation for classes with parameterized ancestors
[ https://issues.apache.org/jira/browse/FLINK-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827895#comment-16827895 ] YangFei commented on FLINK-12175: - [~andbul] are you working on this fix? Can I work on this ticket? > TypeExtractor.getMapReturnTypes produces different TypeInformation than > createTypeInformation for classes with parameterized ancestors > -- > > Key: FLINK-12175 > URL: https://issues.apache.org/jira/browse/FLINK-12175 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.7.2, 1.9.0 >Reporter: Dylan Adams >Priority: Major > > I expect that the {{TypeMapper}} {{createTypeInformation}} and > {{getMapReturnTypes}} would produce equivalent type information for the same > type. But when there is a parameterized superclass, this does not appear to > be the case. > Here's a test case that could be added to {{PojoTypeExtractorTest.java}} that > demonstrates the issue: > {code} > public static class Pojo implements Serializable { > public int digits; > public String letters; > } > public static class ParameterizedParent implements > Serializable { > public T pojoField; > } > public static class ConcreteImpl extends ParameterizedParent { > public double precise; > } > public static class ConcreteMapFunction implements MapFunction ConcreteImpl> { > @Override > public ConcreteImpl map(ConcreteImpl value) throws Exception { > return null; > } > } > @Test > public void testMapReturnType() { > final TypeInformation directTypeInfo = > TypeExtractor.createTypeInfo(ConcreteImpl.class); > Assert.assertTrue(directTypeInfo instanceof PojoTypeInfo); > TypeInformation directPojoFieldTypeInfo = ((PojoTypeInfo) > directTypeInfo).getPojoFieldAt(0).getTypeInformation(); > Assert.assertTrue(directPojoFieldTypeInfo instanceof PojoTypeInfo); > final TypeInformation mapReturnTypeInfo > = TypeExtractor.getMapReturnTypes(new ConcreteMapFunction(), > directTypeInfo); > Assert.assertTrue(mapReturnTypeInfo instanceof PojoTypeInfo); > TypeInformation mapReturnPojoFieldTypeInfo = ((PojoTypeInfo) > mapReturnTypeInfo).getPojoFieldAt(0).getTypeInformation(); > Assert.assertTrue(mapReturnPojoFieldTypeInfo instanceof PojoTypeInfo); > Assert.assertEquals(directTypeInfo, mapReturnTypeInfo); > } > {code} > This test case will fail on the last two asserts because > {{getMapReturnTypes}} produces a {{TypeInformation}} for {{ConcreteImpl}} > with a {{GenericTypeInfo}} for the {{pojoField}}, whereas > {{createTypeInformation}} correctly produces a {{PojoTypeInfo}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] wujinhu removed a comment on issue #7798: [FLINK-11388][fs] Add Aliyun OSS recoverable writer
wujinhu removed a comment on issue #7798: [FLINK-11388][fs] Add Aliyun OSS recoverable writer URL: https://github.com/apache/flink/pull/7798#issuecomment-471811588 @StefanRRichter would you please help review this PR? Thanks :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] wujinhu commented on issue #7798: [FLINK-11388][fs] Add Aliyun OSS recoverable writer
wujinhu commented on issue #7798: [FLINK-11388][fs] Add Aliyun OSS recoverable writer URL: https://github.com/apache/flink/pull/7798#issuecomment-487364585 Rebase master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11762) Update WindowOperatorMigrationTest for 1.8
[ https://issues.apache.org/jira/browse/FLINK-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827885#comment-16827885 ] vinoyang commented on FLINK-11762: -- [~yangfei] [https://github.com/apache/flink/pull/8168] > Update WindowOperatorMigrationTest for 1.8 > -- > > Key: FLINK-11762 > URL: https://issues.apache.org/jira/browse/FLINK-11762 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream, Tests >Affects Versions: 1.8.0 >Reporter: vinoyang >Assignee: vinoyang >Priority: Blocker > Fix For: 1.9.0 > > > Update {{WindowOperatorMigrationTest}} so that it covers restoring from 1.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8300: [FLINK-11638] Translate Savepoints page into Chinese
flinkbot commented on issue #8300: [FLINK-11638] Translate Savepoints page into Chinese URL: https://github.com/apache/flink/pull/8300#issuecomment-487363436 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11762) Update WindowOperatorMigrationTest for 1.8
[ https://issues.apache.org/jira/browse/FLINK-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827883#comment-16827883 ] YangFei commented on FLINK-11762: - [~yanghua] can you tell me the umbrella issue which contains all of the subtasks? I want to have a look > Update WindowOperatorMigrationTest for 1.8 > -- > > Key: FLINK-11762 > URL: https://issues.apache.org/jira/browse/FLINK-11762 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream, Tests >Affects Versions: 1.8.0 >Reporter: vinoyang >Assignee: vinoyang >Priority: Blocker > Fix For: 1.9.0 > > > Update {{WindowOperatorMigrationTest}} so that it covers restoring from 1.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11638) Translate "Savepoints" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11638: --- Labels: pull-request-available (was: ) > Translate "Savepoints" page into Chinese > > > Key: FLINK-11638 > URL: https://issues.apache.org/jira/browse/FLINK-11638 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > > doc locates in flink/docs/ops/state/savepoints.zh.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12141) Allow @TypeInfo annotation on POJO field declarations
[ https://issues.apache.org/jira/browse/FLINK-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827882#comment-16827882 ] YangFei commented on FLINK-12141: - I am interesting in this issue . [~gyfora] are you woking on this issue now? if not ,I can do that .Thanks . > Allow @TypeInfo annotation on POJO field declarations > - > > Key: FLINK-12141 > URL: https://issues.apache.org/jira/browse/FLINK-12141 > Project: Flink > Issue Type: New Feature > Components: API / Type Serialization System >Reporter: Gyula Fora >Priority: Minor > Labels: starter > > The TypeInfo annotation is a great way to declare serializers for custom > types however I feel that it's usage is limited by the fact that it can only > be used on types that are declared in the project. > By allowing the annotation to be used on field declarations we could improve > the TypeExtractor logic to use the type factory when creating the > PojoTypeInformation. > This would be a big improvement as in many cases classes from other libraries > or collection types are used within custom Pojo classes and Flink would > default to Kryo serialization which would hurt performance and cause problems > later. > The current workaround in these cases is to implement a custom serializer for > the entire pojo which is a waste of effort when only a few fields might > require custom serialization logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] XuQianJin-Stars opened a new pull request #8300: [FLINK-11638] Translate Savepoints page into Chinese
XuQianJin-Stars opened a new pull request #8300: [FLINK-11638] Translate Savepoints page into Chinese URL: https://github.com/apache/flink/pull/8300 ## What is the purpose of the change *(This pull request to Translate "Savepoints" page into Chinese.)* ## Brief change log - *doc locates in flink/docs/ops/state/savepoints.zh.md* ## Verifying this change *(Please pick either of the following options)* no change to test. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] link3280 edited a comment on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)
link3280 edited a comment on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber) URL: https://github.com/apache/flink/pull/8299#issuecomment-487362995 Oops. It seems that these modules would be removed in the future. https://github.com/apache/flink/pull/8225 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] link3280 commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)
link3280 commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber) URL: https://github.com/apache/flink/pull/8299#issuecomment-487362995 Oops. It seems that these modules would be removed in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12356) Optimise version experssion of flink-shaded-hadoop2(-uber)
[ https://issues.apache.org/jira/browse/FLINK-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12356: --- Labels: pull-request-available (was: ) > Optimise version experssion of flink-shaded-hadoop2(-uber) > -- > > Key: FLINK-12356 > URL: https://issues.apache.org/jira/browse/FLINK-12356 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.8.1 >Reporter: Paul Lin >Assignee: Paul Lin >Priority: Trivial > Labels: pull-request-available > > Since the new version scheme for hadoop-based modules, we use version > literals in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be > replaced by `${parent.version}` variable for better management. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)
flinkbot commented on issue #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber) URL: https://github.com/apache/flink/pull/8299#issuecomment-487362574 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] link3280 opened a new pull request #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber)
link3280 opened a new pull request #8299: [FLINK-12356][BuildSystem] Optimise version experssion of flink-shaded-hadoop2(-uber) URL: https://github.com/apache/flink/pull/8299 ## What is the purpose of the change Since the new version scheme for hadoop-based modules, we use version literals in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be replaced by `${parent.version}` variable for better management. ## Brief change log - Use `${parent.version}` to replace version literals in version expressions. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-12347) flink-table-runtime-blink is missing scala suffix
[ https://issues.apache.org/jira/browse/FLINK-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-12347. Resolution: Fixed master: 6bc25c2c3691d9bd2c12fcb0b3c123cf5ab70b7b > flink-table-runtime-blink is missing scala suffix > - > > Key: FLINK-12347 > URL: https://issues.apache.org/jira/browse/FLINK-12347 > Project: Flink > Issue Type: Bug > Components: Build System, Table SQL / Runtime >Affects Versions: 1.9.0 >Reporter: Chesnay Schepler >Assignee: Liya Fan >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {{flink-table-runtime-blink}} has a dependency on {{flink-streaming-java}} > and thus requires a scala suffix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zentol merged pull request #8293: [FLINK-12347][Table SQL/Runtime]add scala suffix for flink-table-runt…
zentol merged pull request #8293: [FLINK-12347][Table SQL/Runtime]add scala suffix for flink-table-runt… URL: https://github.com/apache/flink/pull/8293 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12356) Optimise version experssion of flink-shaded-hadoop2(-uber)
Paul Lin created FLINK-12356: Summary: Optimise version experssion of flink-shaded-hadoop2(-uber) Key: FLINK-12356 URL: https://issues.apache.org/jira/browse/FLINK-12356 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.8.1 Reporter: Paul Lin Assignee: Paul Lin Since the new version scheme for hadoop-based modules, we use version literals in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be replaced by `${parent.version}` variable for better management. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] eaglewatcherwb closed pull request #8298: test
eaglewatcherwb closed pull request #8298: test URL: https://github.com/apache/flink/pull/8298 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8298: test
flinkbot commented on issue #8298: test URL: https://github.com/apache/flink/pull/8298#issuecomment-487359619 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] eaglewatcherwb opened a new pull request #8298: test
eaglewatcherwb opened a new pull request #8298: test URL: https://github.com/apache/flink/pull/8298 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12351) AsyncWaitOperator should deep copy StreamElement when object reuse is enabled
[ https://issues.apache.org/jira/browse/FLINK-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827861#comment-16827861 ] aitozi commented on FLINK-12351: Hi, [~jark] I checked the objectuse config in other operator just now. Found that it's mostly been checked in batch related operator. Is only AsyncWaitOperator affected? But i think other operator may be affected too. If user use the object reuse feature without paying attention to the side effect of the changes on the object as the doc of ExecutionConfig#enableObjectReuse say, the result may be unpredictable. Or we have to enable objectReuse to operator level to let user config the behaviour of the operator individually, what's your idea? > AsyncWaitOperator should deep copy StreamElement when object reuse is enabled > - > > Key: FLINK-12351 > URL: https://issues.apache.org/jira/browse/FLINK-12351 > Project: Flink > Issue Type: Bug >Reporter: Jark Wu >Priority: Major > Fix For: 1.9.0 > > > Currently, AsyncWaitOperator directly put the input StreamElement into > {{StreamElementQueue}}. But when object reuse is enabled, the StreamElement > is reused, which means the element in {{StreamElementQueue}} will be > modified. As a result, the output of AsyncWaitOperator might be wrong. > An easy way to fix this might be deep copy the input StreamElement when > object reuse is enabled, like this: > https://github.com/apache/flink/blob/blink/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/async/AsyncWaitOperator.java#L209 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12183) Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
[ https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827860#comment-16827860 ] Yumeng Zhang edited comment on FLINK-12183 at 4/28/19 8:41 AM: --- FLINK-12247 seems to fix this problem, but it looks too brutal, just skips the null elements. If the subtask has many attempts like 1000 times, it will traverse the subtask's attempts 1000 times. I don't think this way truly fixes the problem. We have to understand why there are null elements and then fix it rather than ignore the cause of the problem. was (Author: yumeng): FLINK-12247 seems to fix this problem, but it looks too brutal, just skips the null elements. If the subtask has many attempts like 1000 times, it will traverse the subtask's attempts 1000 times. I don't think this way truly fixes the problem. > Job Cluster doesn't stop after cancel a running job in per-job Yarn mode > > > Key: FLINK-12183 > URL: https://issues.apache.org/jira/browse/FLINK-12183 > Project: Flink > Issue Type: Bug > Components: Runtime / REST >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Yumeng Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The per-job Yarn cluster doesn't stop after cancel a running job if the job > restarted many times, like 1000 times, in a short time. > The bug is in archiveExecutionGraph() phase before executing > removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will > exit unexpectedly with NullPointerException in archiveExecutionGraph() phase. > It's hard to find that because here it only catches IOException. In > SubtaskExecutionAttemptDetailsHandler and > SubtaskExecutionAttemptAccumulatorsHandler, when calling > archiveJsonWithPath() method, it will construct some json information about > prior execution attempts but the index is from 0 which might be dropped index > for the for loop. In default, it will return null when trying to get the > prior execution attempt (AccessExecution attempt = > subtask.getPriorExecutionAttempt(x)). -- This message was sent by Atlassian JIRA (v7.6.3#76005)