[ 
https://issues.apache.org/jira/browse/FLINK-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454477#comment-15454477
 ] 

ASF GitHub Bot commented on FLINK-4535:
---------------------------------------

GitHub user beyond1920 opened a pull request:

    https://github.com/apache/flink/pull/2451

    [FLINK-4535] [cluster management] resourceManager process the registration 
from TaskExecutor

    This pull request is to implement ResourceManager registration with 
TaskExecutor, which including:
    1. Check whether input resourceManagerLeaderId is as same as the current 
leadershipSessionId of resourceManager. If not, it means that maybe two or more 
resourceManager exists at the same time, and current resourceManager is not the 
proper rm. so it rejects or ignores the registration.
    2. Check whether exists a valid taskExecutor at the giving address by 
connecting to the address. Reject the registration from invalid address. (which 
is hidden in the connect method)
    3. Keep resourceID and taskExecutorGateway mapping relationships, And 
optionally keep resourceID and container mapping relationships in yarn mode.
    4. Send registration successful ack to the taskExecutor.
    
    Main difference are 3 points:
    1. Add UnmatchedLeaderSessionIDException to specify that received leader 
session ID is not as same as expected.
    2. Change registerTaskExecutor method  of ResourceManager
    3. Add a test class for ResourceManager

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/alibaba/flink jira-4535

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2451
    
----
commit fa795ca7a992859398ed30180e50ef036a93b355
Author: beyond1920 <[email protected]>
Date:   2016-09-01T03:14:00Z

    resourceManager process the registration from TaskExecutor

----


> ResourceManager registration with TaskExecutor
> ----------------------------------------------
>
>                 Key: FLINK-4535
>                 URL: https://issues.apache.org/jira/browse/FLINK-4535
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Cluster Management
>            Reporter: zhangjing
>            Assignee: zhangjing
>
> When TaskExecutor register at ResourceManager, it takes the following 3 input 
> parameters:
> 1. resourceManagerLeaderId:  the fencing token for the ResourceManager leader 
> which is kept by taskExecutor who send the registration
> 2.  taskExecutorAddress: the address of taskExecutor
> 3. resourceID: The resource ID of the TaskExecutor that registers
> ResourceManager need to process the registration event based on the following 
> steps:
> 1. Check whether input resourceManagerLeaderId is as same as the current 
> leadershipSessionId of resourceManager. If not, it means that maybe two or 
> more resourceManager exists at the same time, and current resourceManager is 
> not the proper rm. so it  rejects or ignores the registration.
> 2. Check whether exists a valid taskExecutor at the giving address by 
> connecting to the address. Reject the registration from invalid address.
> 3. Check whether it is a duplicate registration by input resourceId, reject 
> the registration
> 4. Keep resourceID and taskExecutorGateway mapping relationships, And 
> optionally keep resourceID and container mapping relationships in yarn mode.
> 5. Create the connection between resourceManager and taskExecutor, and ensure 
> its healthy based on heartbeat rpc calls between rm and tm ?
> 6. Send registration successful ack to the taskExecutor.
> Discussion:
> Maybe we need import errorCode or several registration decline subclass to 
> distinguish the different causes of decline registration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to