[ 
https://issues.apache.org/jira/browse/FLINK-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237161#comment-16237161
 ] 

ASF GitHub Bot commented on FLINK-7870:
---------------------------------------

Github user shuai-xu commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4887#discussion_r148715651
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManager.java
 ---
    @@ -302,7 +302,12 @@ public boolean unregisterSlotRequest(AllocationID 
allocationId) {
                PendingSlotRequest pendingSlotRequest = 
pendingSlotRequests.remove(allocationId);
     
                if (null != pendingSlotRequest) {
    -                   cancelPendingSlotRequest(pendingSlotRequest);
    +                   if (pendingSlotRequest.isAssigned()) {
    +                           cancelPendingSlotRequest(pendingSlotRequest);
    +                   }
    +                   else {
    +                           
resourceActions.cancelResourceAllocation(pendingSlotRequest.getResourceProfile());
    --- End diff --
    
    Yes, the SlotManager can decide to release the resource more than needed. 
But in a worst case:
    1. Now the MESOS or YARN cluster have not enough resource.
    2. A job ask for 100 worker of size A;
    3. As there are not enough resource, the job failover, the previous 100 is 
not cancelled, it ask another 100.
    4. This repeated several times, the pending requests for worker of size A 
reaches 10000.
    5. A worker of size B crashed, so the job now only need 100 woker of size A 
and 1 worker of size B. But the YARN or MESOS think the job need 10000 A and 1 
B as the request are never cancelled.
    6. The MESOS/YARN now have some resources for 110 A, more than 100 A and 1 
B, and it begin to assign resource for the job, but it first try to allocate 
10000 containers of size A, and the job still can not be started as it is lack 
of container B. 
    7. This may cause the job can not be started even when the cluster resource 
is now enough in a long time.
    8. And this did happen in our cluster, as our cluster is busy. So I think 
it's better to keep this protocol, and different resource managers can treat 
this protocol according to their need.


> SlotPool should cancel the slot request to RM if not need any more.
> -------------------------------------------------------------------
>
>                 Key: FLINK-7870
>                 URL: https://issues.apache.org/jira/browse/FLINK-7870
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management
>            Reporter: shuai.xu
>            Assignee: shuai.xu
>            Priority: Major
>              Labels: flip-6
>
> 1. SlotPool will request slot to rm if its slots are not enough.
> 2. If a slot request is not fulfilled in a certain time, SlotPool will treat 
> the request as timeout and send a new slot request by triggering a failover 
> in JobMaster, the previous request is not needed any more, but rm does not 
> know it.
> 3. This may cause the rm request much more resource than the job really need.
> For example:
> 1. A job need 100 slots. RM request 100 container to YARN.
> 2. But YARN is busy now, it has no resource for the job.
> 3. The job failover as the resource request not fulfilled in time.
> 4. It ask 100 slots again, now RM request 200 container to YARN.
> 5. If failover server time, the containers request  will become more and more.
> 6. Now YARN has resource, it will find that the job may need thousands of 
> containers. This is a waste of resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to