[ https://issues.apache.org/jira/browse/FLINK-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933121#comment-16933121 ]
Till Rohrmann commented on FLINK-14074: --------------------------------------- So maybe it is the same problem as with FLINK-13241. > MesosResourceManager can't create new taskmanagers in Session Cluster Mode. > --------------------------------------------------------------------------- > > Key: FLINK-14074 > URL: https://issues.apache.org/jira/browse/FLINK-14074 > Project: Flink > Issue Type: Bug > Components: Deployment / Mesos > Affects Versions: 1.9.0 > Environment: Flink HA Session cluster 1.9.0 on mesos. > Reporter: Alexander Kasyanenko > Priority: Major > > Hi, I'm trying to launch multiple jobs in Flink Session Cluster, deployed on > mesos. > Flink's version is 1.9.0. > The very first resource allocation completes successfully, and first > submitted job launches, but submitting any amount of jobs afterwords doesn't > affect the cluster in any way and no additional TaskManagers are allocated. > From the logs I see that MesosResourceManager is requesting Slots for the > newly submitted jobs: "{{o.a.f.m.r.c.MesosResourceManager - Request slot > with profile ResourceProfile..."}} but line {{"Starting a new worker.}}" > appears in log only the same amount of times as taskmanagers count, allocated > for the first job. > I'm a complete noob in flink internals, but took a wild guess about a reason. > I think that the problem is in this check: > [https://github.com/apache/flink/blob/release-1.9.0/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosResourceManager.java#L436] > It might be that RM is lazily allocated at the first call by a factory, and > then a private final field {{slotsPerWorker}} is set. So this check will > prevent creation of any new worker after iterator traverses the entire > collection. My main assumption is that {{slotsPerWorker}} is never modified > again. > > I'm sorry that I didn't do much of investigation before reporting, but I'll > try to do some after a weekend. I plan to build flink without this check and > see if it helps. Also I'll play around with tests for this RM. Since it's my > time running time flink internals, I'll be back after a few days. > Any help will much appreciated. > Thanks in advance. -- This message was sent by Atlassian Jira (v8.3.4#803005)