[jira] [Updated] (FLINK-7469) Handle slot requests occuring before RM registration completes

2017-08-17 Thread Eron Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eron Wright  updated FLINK-7469:

Attachment: jm.log
taskmanager-3.log

> Handle slot requests occuring before RM registration completes
> --
>
> Key: FLINK-7469
> URL: https://issues.apache.org/jira/browse/FLINK-7469
> Project: Flink
>  Issue Type: Sub-task
>  Components: Cluster Management
>Reporter: Eron Wright 
>Priority: Minor
> Attachments: jm.log, taskmanager-3.log
>
>
> Occasionally the TM-to-RM registration ask times out, causing the TM to pause 
> registration for 10 seconds.  Meanwhile the registration may actually have 
> succeeded in the RM.   Slot requests may then arrive at the TM while RM 
> registration is incomplete.   
> The current behavior appears to be that the TM honors the slot request.   
> Please determine whether this is a feature or a bug.   If a feature, maybe a 
> slot request should implicitly complete the registration.
> See attached a log showing a certain TM exhibiting the described behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7469) Handle slot requests occuring before RM registration completes

2017-08-17 Thread Eron Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eron Wright  updated FLINK-7469:

Description: 
*Description*
Occasionally the TM-to-RM registration ask times out, causing the TM to pause 
registration for 10 seconds.  Meanwhile the registration may actually have 
succeeded in the RM.   Slot requests may then arrive at the TM while RM 
registration is incomplete.   

The current behavior appears to be that the TM honors the slot request.   
Please determine whether this is a feature or a bug.   If a feature, maybe a 
slot request should implicitly complete the registration.

*Example*
See attached a log showing a certain TM exhibiting the described behavior. 
The RM launched 12 TMs in parallel, evidently causing the RM to sluggishly 
respond to a couple of the TM registration requests.   From the logs we see 
that '00012' and '3' experienced a registration timeout but accepted a slot 
request anyway.

  was:

Occasionally the TM-to-RM registration ask times out, causing the TM to pause 
registration for 10 seconds.  Meanwhile the registration may actually have 
succeeded in the RM.   Slot requests may then arrive at the TM while RM 
registration is incomplete.   

The current behavior appears to be that the TM honors the slot request.   
Please determine whether this is a feature or a bug.   If a feature, maybe a 
slot request should implicitly complete the registration.

See attached a log showing a certain TM exhibiting the described behavior.


> Handle slot requests occuring before RM registration completes
> --
>
> Key: FLINK-7469
> URL: https://issues.apache.org/jira/browse/FLINK-7469
> Project: Flink
>  Issue Type: Sub-task
>  Components: Cluster Management
>Reporter: Eron Wright 
>Priority: Minor
> Attachments: jm.log, taskmanager-3.log
>
>
> *Description*
> Occasionally the TM-to-RM registration ask times out, causing the TM to pause 
> registration for 10 seconds.  Meanwhile the registration may actually have 
> succeeded in the RM.   Slot requests may then arrive at the TM while RM 
> registration is incomplete.   
> The current behavior appears to be that the TM honors the slot request.   
> Please determine whether this is a feature or a bug.   If a feature, maybe a 
> slot request should implicitly complete the registration.
> *Example*
> See attached a log showing a certain TM exhibiting the described behavior.
>  The RM launched 12 TMs in parallel, evidently causing the RM to sluggishly 
> respond to a couple of the TM registration requests.   From the logs we see 
> that '00012' and '3' experienced a registration timeout but accepted a 
> slot request anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)