Alexander Kasyanenko created FLINK-14074:
--------------------------------------------

             Summary: MesosResourceManager can't create new taskmanagers in 
Session Cluster Mode.
                 Key: FLINK-14074
                 URL: https://issues.apache.org/jira/browse/FLINK-14074
             Project: Flink
          Issue Type: Bug
          Components: Deployment / Mesos
    Affects Versions: 1.9.0
         Environment: Flink HA Session cluster 1.9.0 on mesos.
            Reporter: Alexander Kasyanenko


Hi, I'm trying to launch multiple jobs in Flink Session Cluster, deployed on 
mesos.
Flink's version is 1.9.0.

The very first resource allocation completes successfully, and first submitted 
job launches, but submitting any amount of jobs afterwords doesn't affect the 
cluster in any way and no additional TaskManagers are allocated.

>From the logs I see that MesosResourceManager is requesting Slots for the 
>newly submitted jobs: "{{o.a.f.m.r.c.MesosResourceManager - Request slot with 
>profile ResourceProfile..." }}but {{"Starting a new worker."}} logline appears 
>only the same amount of times as taskmanagers count, allocated for the first 
>job.

I'm a complete noob in flink internals, but took a wild guess about a reason. I 
think that the problem is in this check: 
[https://github.com/apache/flink/blob/release-1.9.0/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosResourceManager.java#L436]

It might be that RM is lazily allocated at the first call by a factory, and 
then a private final field {{slotsPerWorker}} is set. So this collection field 
fill prevent any new worker creation after original amount was created.

 

I'll try to build a flink without this check and see if it helps. Also I'll 
play around with tests for this RM. Since it's my time running time flink 
intermals, I'll be back after a few days (it would take some time + country I'm 
in will have a national holiday).

Any help will much appreciated.

Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to