[jira] [Updated] (FLINK-25055) Support listen and notify mechanism for PartitionRequest

2023-10-31 Thread Yangze Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yangze Guo updated FLINK-25055:
---
Fix Version/s: 1.19.0

> Support listen and notify mechanism for PartitionRequest
> 
>
> Key: FLINK-25055
> URL: https://issues.apache.org/jira/browse/FLINK-25055
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.14.0, 1.12.5, 1.13.3
>Reporter: Fang Yong
>Assignee: Yangze Guo
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.19.0
>
>
> We submit batch jobs to flink session cluster with eager scheduler for olap. 
> JM deploys subtasks to TaskManager independently, and the downstream subtasks 
> may start before the upstream ones are ready. The downstream subtask sends 
> PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
> from them. Then it will retry to send PartitionRequest after a few ms until 
> timeout.
> The current approach raises two problems. First, there will be too many retry 
> PartitionRequest messages. Each downstream subtask will send PartitionRequest 
> to all its upstream subtasks and the total number of messages will be O(N*N), 
> where N is the parallelism of subtasks. Secondly, the interval between 
> polling retries will increase the delay for upstream and downstream tasks to 
> confirm PartitionRequest.
> We want to support listen and notify mechanism for PartitionRequest when the 
> job needs no failover. Upstream TaskManager will add the PartitionRequest to 
> a listen list with a timeout checker, and notify the request when the task 
> register its partition in the TaskManager.
> [~nkubicek] I noticed that your scenario of using flink is similar to ours. 
> What do you think?  And hope to hear from you [~trohrmann] THX



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-25055) Support listen and notify mechanism for PartitionRequest

2022-07-06 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-25055:
---
Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 30 days, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it.


> Support listen and notify mechanism for PartitionRequest
> 
>
> Key: FLINK-25055
> URL: https://issues.apache.org/jira/browse/FLINK-25055
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.14.0, 1.12.5, 1.13.3
>Reporter: Shammon
>Assignee: Shammon
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> We submit batch jobs to flink session cluster with eager scheduler for olap. 
> JM deploys subtasks to TaskManager independently, and the downstream subtasks 
> may start before the upstream ones are ready. The downstream subtask sends 
> PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
> from them. Then it will retry to send PartitionRequest after a few ms until 
> timeout.
> The current approach raises two problems. First, there will be too many retry 
> PartitionRequest messages. Each downstream subtask will send PartitionRequest 
> to all its upstream subtasks and the total number of messages will be O(N*N), 
> where N is the parallelism of subtasks. Secondly, the interval between 
> polling retries will increase the delay for upstream and downstream tasks to 
> confirm PartitionRequest.
> We want to support listen and notify mechanism for PartitionRequest when the 
> job needs no failover. Upstream TaskManager will add the PartitionRequest to 
> a listen list with a timeout checker, and notify the request when the task 
> register its partition in the TaskManager.
> [~nkubicek] I noticed that your scenario of using flink is similar to ours. 
> What do you think?  And hope to hear from you [~trohrmann] THX



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-25055) Support listen and notify mechanism for PartitionRequest

2022-04-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-25055:
---
Labels: pull-request-available  (was: )

> Support listen and notify mechanism for PartitionRequest
> 
>
> Key: FLINK-25055
> URL: https://issues.apache.org/jira/browse/FLINK-25055
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.14.0, 1.12.5, 1.13.3
>Reporter: Shammon
>Assignee: Shammon
>Priority: Major
>  Labels: pull-request-available
>
> We submit batch jobs to flink session cluster with eager scheduler for olap. 
> JM deploys subtasks to TaskManager independently, and the downstream subtasks 
> may start before the upstream ones are ready. The downstream subtask sends 
> PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
> from them. Then it will retry to send PartitionRequest after a few ms until 
> timeout.
> The current approach raises two problems. First, there will be too many retry 
> PartitionRequest messages. Each downstream subtask will send PartitionRequest 
> to all its upstream subtasks and the total number of messages will be O(N*N), 
> where N is the parallelism of subtasks. Secondly, the interval between 
> polling retries will increase the delay for upstream and downstream tasks to 
> confirm PartitionRequest.
> We want to support listen and notify mechanism for PartitionRequest when the 
> job needs no failover. Upstream TaskManager will add the PartitionRequest to 
> a listen list with a timeout checker, and notify the request when the task 
> register its partition in the TaskManager.
> [~nkubicek] I noticed that your scenario of using flink is similar to ours. 
> What do you think?  And hope to hear from you [~trohrmann] THX



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25055) Support listen and notify mechanism for PartitionRequest

2021-12-14 Thread Shammon (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shammon updated FLINK-25055:

Parent: FLINK-25318
Issue Type: Sub-task  (was: Improvement)

> Support listen and notify mechanism for PartitionRequest
> 
>
> Key: FLINK-25055
> URL: https://issues.apache.org/jira/browse/FLINK-25055
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.14.0, 1.12.5, 1.13.3
>Reporter: Shammon
>Assignee: Shammon
>Priority: Major
>
> We submit batch jobs to flink session cluster with eager scheduler for olap. 
> JM deploys subtasks to TaskManager independently, and the downstream subtasks 
> may start before the upstream ones are ready. The downstream subtask sends 
> PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
> from them. Then it will retry to send PartitionRequest after a few ms until 
> timeout.
> The current approach raises two problems. First, there will be too many retry 
> PartitionRequest messages. Each downstream subtask will send PartitionRequest 
> to all its upstream subtasks and the total number of messages will be O(N*N), 
> where N is the parallelism of subtasks. Secondly, the interval between 
> polling retries will increase the delay for upstream and downstream tasks to 
> confirm PartitionRequest.
> We want to support listen and notify mechanism for PartitionRequest when the 
> job needs no failover. Upstream TaskManager will add the PartitionRequest to 
> a listen list with a timeout checker, and notify the request when the task 
> register its partition in the TaskManager.
> [~nkubicek] I noticed that your scenario of using flink is similar to ours. 
> What do you think?  And hope to hear from you [~trohrmann] THX



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25055) Support listen and notify mechanism for PartitionRequest

2021-11-25 Thread Shammon (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shammon updated FLINK-25055:

Description: 
We submit batch jobs to flink session cluster with eager scheduler for olap. JM 
deploys subtasks to TaskManager independently, and the downstream subtasks may 
start before the upstream ones are ready. The downstream subtask sends 
PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
from them. Then it will retry to send PartitionRequest after a few ms until 
timeout.

The current approach raises two problems. First, there will be too many retry 
PartitionRequest messages. Each downstream subtask will send PartitionRequest 
to all its upstream subtasks and the total number of messages will be O(N*N), 
where N is the parallelism of subtasks. Secondly, the interval between polling 
retries will increase the delay for upstream and downstream tasks to confirm 
PartitionRequest.

We want to support listen and notify mechanism for PartitionRequest when the 
job needs no failover. Upstream TaskManager will add the PartitionRequest to a 
listen list with a timeout checker, and notify the request when the task 
register its partition in the TaskManager.

[~nkubicek] I noticed that your scenario of using flink is similar to ours. 
What do you think?  And hope to hear from you [~trohrmann] THX

  was:
We submit batch jobs to flink session cluster with eager scheduler for olap. JM 
deploys subtasks to TaskManager independently, and the downstream subtasks may 
start before the upstream ones are not ready. The downstream subtask sends 
PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
from them. Then it will retry to send PartitionRequest after a few ms until 
timeout.

The current approach raises two problems. First, there will be too many retry 
PartitionRequest messages. Each downstream subtask will send PartitionRequest 
to all its upstream subtasks and the total number of messages will be O(N*N), 
where N is the parallelism of subtasks. Secondly, the interval between polling 
retries will increase the delay for upstream and downstream tasks to confirm 
PartitionRequest.

We want to support listen and notify mechanism for PartitionRequest when the 
job needs no failover. Upstream TaskManager will add the PartitionRequest to a 
listen list with a timeout checker, and notify the request when the task 
register its partition in the TaskManager.

[~nkubicek] I noticed that your scenario of using flink is similar to ours. 
What do you think?  And hope to hear from you [~trohrmann] THX


> Support listen and notify mechanism for PartitionRequest
> 
>
> Key: FLINK-25055
> URL: https://issues.apache.org/jira/browse/FLINK-25055
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.14.0, 1.12.5, 1.13.3
>Reporter: Shammon
>Priority: Major
>
> We submit batch jobs to flink session cluster with eager scheduler for olap. 
> JM deploys subtasks to TaskManager independently, and the downstream subtasks 
> may start before the upstream ones are ready. The downstream subtask sends 
> PartitionRequest to upstream ones, and may receive PartitionNotFoundException 
> from them. Then it will retry to send PartitionRequest after a few ms until 
> timeout.
> The current approach raises two problems. First, there will be too many retry 
> PartitionRequest messages. Each downstream subtask will send PartitionRequest 
> to all its upstream subtasks and the total number of messages will be O(N*N), 
> where N is the parallelism of subtasks. Secondly, the interval between 
> polling retries will increase the delay for upstream and downstream tasks to 
> confirm PartitionRequest.
> We want to support listen and notify mechanism for PartitionRequest when the 
> job needs no failover. Upstream TaskManager will add the PartitionRequest to 
> a listen list with a timeout checker, and notify the request when the task 
> register its partition in the TaskManager.
> [~nkubicek] I noticed that your scenario of using flink is similar to ours. 
> What do you think?  And hope to hear from you [~trohrmann] THX



--
This message was sent by Atlassian Jira
(v8.20.1#820001)