[jira] [Updated] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2020-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-14163:
---
Labels: pull-request-available  (was: )

> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Assignee: Yuan Mei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> The partition registration is an async interface 
> ({{ShuffleMaster#registerPartitionWithProducer(...)}}), so 
> {{Execution#producedPartitions}} is possible[1] not set when used. 
> Usages includes:
> 1. deploying this task, so that the task may be deployed without its result 
> partitions assigned, and the job would hang. (DefaultScheduler issue only, 
> since legacy scheduler handled this case)
> 2. generating input descriptors for downstream tasks: 
> 3. retrieve {{ResultPartitionID}} for partition releasing: 
> [1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is 
> not problematic at the moment since it returns a completed future on 
> registration, so that it would be a synchronized process. However, if users 
> implement their own shuffle service in which the 
> {{ShuffleMaster#registerPartitionWithProducer}} returns an pending future, it 
> can be a problem. This is possible since customizable shuffle service is open 
> to users since 1.9 (via config "shuffle-service-factory.class").
> To avoid issues to happen, we may either 
> 1. fix all the usages of {{Execution#producedPartitions}} regarding the async 
> assigning, or 
> 2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync 
> interface



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2020-01-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14163:

Description: 
Currently {{Execution#producedPartitions}} is assigned after the partitions 
have completed the registration to shuffle master in 
{{Execution#registerProducedPartitions(...)}}.
The partition registration is an async interface 
({{ShuffleMaster#registerPartitionWithProducer(...)}}), so 
{{Execution#producedPartitions}} is possible[1] not set when used. 

Usages includes:
1. deploying this task, so that the task may be deployed without its result 
partitions assigned, and the job would hang. (DefaultScheduler issue only, 
since legacy scheduler handled this case)
2. generating input descriptors for downstream tasks: 
3. retrieve {{ResultPartitionID}} for partition releasing: 

[1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is 
not problematic at the moment since it returns a completed future on 
registration, so that it would be a synchronized process. However, if users 
implement their own shuffle service in which the 
{{ShuffleMaster#registerPartitionWithProducer}} returns an pending future, it 
can be a problem. This is possible since customizable shuffle service is open 
to users since 1.9 (via config "shuffle-service-factory.class").

To avoid issues to happen, we may either 
1. fix all the usages of {{Execution#producedPartitions}} regarding the async 
assigning, or 
2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync 
interface

  was:
Currently {{Execution#producedPartitions}} is assigned after the partitions 
have completed the registration to shuffle master in 
{{Execution#registerProducedPartitions(...)}}.
The partition registration is an async interface 
({{ShuffleMaster#registerPartitionWithProducer(...)}}), so 
{{Execution#producedPartitions}} is possible[1] not set when used. 

Usages includes:
1. deploying this task, so that the task may be deployed without its result 
partitions assigned, and the job would hang. (DefaultScheduler issue only, 
since legacy scheduler handled this case)
2. generating input descriptors for downstream tasks: 
3. retrieve {{ResultPartitionID}} for partition releasing: 

[1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is 
not problematic at the moment since it returns a completed future on 
registration, so that it would be a synchronized process. However, if users 
implement their own shuffle service in which the 
{{ShuffleMaster#registerPartitionWithProducer}} returns an pending future, it 
can be a problem. This is possible since customizable shuffle service is open 
to users since 1.9 (via config "shuffle-service-factory.class").

To avoid issues to happen, we may either 
1. fix all the usages of {{Execution#producedPartitions}} regarding the async 
set, or 
2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync 
interface


> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> The partition registration is an async interface 
> ({{ShuffleMaster#registerPartitionWithProducer(...)}}), so 
> {{Execution#producedPartitions}} is possible[1] not set when used. 
> Usages includes:
> 1. deploying this task, so that the task may be deployed without its result 
> partitions assigned, and the job would hang. (DefaultScheduler issue only, 
> since legacy scheduler handled this case)
> 2. generating input descriptors for downstream tasks: 
> 3. retrieve {{ResultPartitionID}} for partition releasing: 
> [1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is 
> not problematic at the moment since it returns a completed future on 
> registration, so that it would be a synchronized process. However, if users 
> implement their own shuffle service in which the 
> {{ShuffleMaster#registerPartitionWithProducer}} returns an pending future, it 
> can be a problem. This is possible since customizable shuffle service is open 
> to users since 1.9 (via config "shuffle-service-factory.class").
> To avoid issues to happen, we may either 
> 1. fix all the usages of {{Execution#producedPartitions}} regarding the async 
> assigning, or 
> 2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync 
> interface



--
This message was sent by Atlassian Jira
(v8.3.4#

[jira] [Updated] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2020-01-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14163:

Description: 
Currently {{Execution#producedPartitions}} is assigned after the partitions 
have completed the registration to shuffle master in 
{{Execution#registerProducedPartitions(...)}}.
The partition registration is an async interface 
({{ShuffleMaster#registerPartitionWithProducer(...)}}), so 
{{Execution#producedPartitions}} is possible[1] not set when used. 

Usages includes:
1. deploying this task, so that the task may be deployed without its result 
partitions assigned, and the job would hang. (DefaultScheduler issue only, 
since legacy scheduler handled this case)
2. generating input descriptors for downstream tasks: 
3. retrieve {{ResultPartitionID}} for partition releasing: 

[1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is 
not problematic at the moment since it returns a completed future on 
registration, so that it would be a synchronized process. However, if users 
implement their own shuffle service in which the 
{{ShuffleMaster#registerPartitionWithProducer}} returns an pending future, it 
can be a problem. This is possible since customizable shuffle service is open 
to users since 1.9 (via config "shuffle-service-factory.class").

To avoid issues to happen, we may either 
1. fix all the usages of {{Execution#producedPartitions}} regarding the async 
set, or 
2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync 
interface

  was:
Currently {{Execution#producedPartitions}} is assigned after the partitions 
have completed the registration to shuffle master in 
{{Execution#registerProducedPartitions(...)}}.
But the task deployment process (in {{Execution#deploy())}} will create 
{{ResultPartitionDeploymentDescriptor}} directly from 
{{Execution#producedPartitions}} without checking whether it's assigned.
This may lead to a task deployed without its result partitions. And eventually 
cause the job to hang.

It is not problematic at the moment when using Flink default shuffle master 
{{NettyShuffleMaster}} since it returns a completed future on registration. 
However, if the behavior is changed or if users are using a customized 
{{ShuffleMaster}}, it may cause problems.

Besides that, {{Execution#producedPartitions}} is also used for 
 * generating downstream task input descriptor
 * retrieve {{ResultPartitionID}} for partition releasing

To avoid issues to happen, we may need to change all the usages of 
{{Execution#producedPartitions}} to a callback way, e.g. change 
{{Execution#producedPartitions}} from {{Map}} to 
{{CompletableFuture>}} and adjust all its usages.


> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> The partition registration is an async interface 
> ({{ShuffleMaster#registerPartitionWithProducer(...)}}), so 
> {{Execution#producedPartitions}} is possible[1] not set when used. 
> Usages includes:
> 1. deploying this task, so that the task may be deployed without its result 
> partitions assigned, and the job would hang. (DefaultScheduler issue only, 
> since legacy scheduler handled this case)
> 2. generating input descriptors for downstream tasks: 
> 3. retrieve {{ResultPartitionID}} for partition releasing: 
> [1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is 
> not problematic at the moment since it returns a completed future on 
> registration, so that it would be a synchronized process. However, if users 
> implement their own shuffle service in which the 
> {{ShuffleMaster#registerPartitionWithProducer}} returns an pending future, it 
> can be a problem. This is possible since customizable shuffle service is open 
> to users since 1.9 (via config "shuffle-service-factory.class").
> To avoid issues to happen, we may either 
> 1. fix all the usages of {{Execution#producedPartitions}} regarding the async 
> set, or 
> 2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync 
> interface



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2019-12-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14163:

Summary: Execution#producedPartitions is possibly not assigned when used  
(was: Execution#producedPartitions is possibly not assigned when deploying 
tasks with DefaultScheduler)

> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> But the task deployment process (in {{Execution#deploy())}} will create 
> {{ResultPartitionDeploymentDescriptor}} directly from 
> {{Execution#producedPartitions}} without checking whether it's assigned.
> This may lead to a task deployed without its result partitions. And 
> eventually cause the job to hang.
> It is not problematic at the moment when using Flink default shuffle master 
> {{NettyShuffleMaster}} since it returns a completed future on registration. 
> However, if the behavior is changed or if users are using a customized 
> {{ShuffleMaster}}, it may cause problems.
> Besides that, {{Execution#producedPartitions}} is also used for 
>  * generating downstream task input descriptor
>  * retrieve {{ResultPartitionID}} for partition releasing
> To avoid issues to happen, we may need to change all the usages of 
> {{Execution#producedPartitions}} to a callback way, e.g. change 
> {{Execution#producedPartitions}} from {{Map ResultPartitionDeploymentDescriptor>}} to 
> {{CompletableFuture ResultPartitionDeploymentDescriptor>>}} and adjust all its usages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2019-09-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14163:

Affects Version/s: 1.9.0

> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> But the task deployment process (in {{Execution#deploy())}} will create 
> {{ResultPartitionDeploymentDescriptor}} directly from 
> {{Execution#producedPartitions}} without checking whether it's assigned.
> This may lead to a task deployed without its result partitions. And 
> eventually cause the job to hang.
> It is not problematic at the moment when using Flink default shuffle master 
> {{NettyShuffleMaster}} since it returns a completed future on registration. 
> However, if the behavior is changed or if users are using a customized 
> {{ShuffleMaster}}, it may cause problems.
> Besides that, {{Execution#producedPartitions}} is also used for 
>  * generating downstream task input descriptor
>  * retrieve {{ResultPartitionID}} for partition releasing
> To avoid issues to happen, we may need to change all the usages of 
> {{Execution#producedPartitions}} to a callback way, e.g. change 
> {{Execution#producedPartitions}} from {{Map ResultPartitionDeploymentDescriptor>}} to 
> {{CompletableFuture ResultPartitionDeploymentDescriptor>>}} and adjust all its usages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2019-09-23 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14163:

Description: 
Currently {{Execution#producedPartitions}} is assigned after the partitions 
have completed the registration to shuffle master in 
{{Execution#registerProducedPartitions(...)}}.
But the task deployment process (in {{Execution#deploy())}} will create 
{{ResultPartitionDeploymentDescriptor}} directly from 
{{Execution#producedPartitions}} without checking whether it's assigned.
This may lead to a task deployed without its result partitions. And eventually 
cause the job to hang.

It is not problematic at the moment when using Flink default shuffle master 
{{NettyShuffleMaster}} since it returns a completed future on registration. 
However, if the behavior is changed or if users are using a customized 
{{ShuffleMaster}}, it may cause problems.

Besides that, {{Execution#producedPartitions}} is also used for 
 * generating downstream task input descriptor
 * retrieve {{ResultPartitionID}} for partition releasing

To avoid issues to happen, we may need to change all the usages of 
{{Execution#producedPartitions}} to a callback way, e.g. change 
{{Execution#producedPartitions}} from {{Map}} to 
{{CompletableFuture>}} and adjust all its usages.

  was:
Currently {{Execution#producedPartitions}} is assigned after the partitions 
have completed the registration to shuffle master in 
{{Execution#registerProducedPartitions(...)}}.
But the task deployment process (in {{Execution#deploy())}} will create 
{{ResultPartitionDeploymentDescriptor}} directly from 
{{Execution#producedPartitions}} without checking whether it's assigned.
This may lead to a task deployed without its result partitions. And eventually 
cause the job to hang.

It is not problematic at the moment when using Flink default shuffle master 
{{NettyShuffleMaster}} since it returns a completed future on registration. 
However, if the behavior is changed or if users are using a customized 
{{ShuffleMaster}}, it may cause problems.

Besides that, {{Execution#producedPartitions}} is also used for 
 * generating downstream task input descriptor
 * retrieve {{ResultPartitionID}} for partition releasing

To avoid issues to happen, we may need to change all the usages of 
{{Execution#producedPartitions} to a callback way, e.g. change 
{{Execution#producedPartitions} from {{Map}} to 
{{CompletableFuture>}} and adjust all its usages.


> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> But the task deployment process (in {{Execution#deploy())}} will create 
> {{ResultPartitionDeploymentDescriptor}} directly from 
> {{Execution#producedPartitions}} without checking whether it's assigned.
> This may lead to a task deployed without its result partitions. And 
> eventually cause the job to hang.
> It is not problematic at the moment when using Flink default shuffle master 
> {{NettyShuffleMaster}} since it returns a completed future on registration. 
> However, if the behavior is changed or if users are using a customized 
> {{ShuffleMaster}}, it may cause problems.
> Besides that, {{Execution#producedPartitions}} is also used for 
>  * generating downstream task input descriptor
>  * retrieve {{ResultPartitionID}} for partition releasing
> To avoid issues to happen, we may need to change all the usages of 
> {{Execution#producedPartitions}} to a callback way, e.g. change 
> {{Execution#producedPartitions}} from {{Map ResultPartitionDeploymentDescriptor>}} to 
> {{CompletableFuture ResultPartitionDeploymentDescriptor>>}} and adjust all its usages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2019-09-22 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14163:

Summary: Execution#producedPartitions is possibly not assigned when used  
(was: Execution#producedPartitions may not be assigned when used)

> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.10.0
>
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> But the task deployment process (in {{Execution#deploy())}} will create 
> {{ResultPartitionDeploymentDescriptor}} directly from 
> {{Execution#producedPartitions}} without checking whether it's assigned.
> This may lead to a task deployed without its result partitions. And 
> eventually cause the job to hang.
> It is not problematic at the moment when using Flink default shuffle master 
> {{NettyShuffleMaster}} since it returns a completed future on registration. 
> However, if the behavior is changed or if users are using a customized 
> {{ShuffleMaster}}, it may cause problems.
> Besides that, {{Execution#producedPartitions}} is also used for 
>  * generating downstream task input descriptor
>  * retrieve {{ResultPartitionID}} for partition releasing
> To avoid issues to happen, we may need to change all the usages of 
> {{Execution#producedPartitions} to a callback way, e.g. change 
> {{Execution#producedPartitions} from {{Map ResultPartitionDeploymentDescriptor>}} to 
> {{CompletableFuture ResultPartitionDeploymentDescriptor>>}} and adjust all its usages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)