[ https://issues.apache.org/jira/browse/FLINK-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929193#comment-16929193 ]
Alexander Kasyanenko commented on FLINK-14074: ---------------------------------------------- Hi [~till.rohrmann], thanks for replying. All {{ResourceProfiles}} for requested slots are of {{UNKNOWN}} type: {{ResourceProfile\{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=-1, nativeMemoryInMB=-1, networkMemoryInMB=-1, managedMemoryInMB=-1}}} Sorry, can't share log at the moment, but if you want, I can prepare an example log next week. > MesosResourceManager can't create new taskmanagers in Session Cluster Mode. > --------------------------------------------------------------------------- > > Key: FLINK-14074 > URL: https://issues.apache.org/jira/browse/FLINK-14074 > Project: Flink > Issue Type: Bug > Components: Deployment / Mesos > Affects Versions: 1.9.0 > Environment: Flink HA Session cluster 1.9.0 on mesos. > Reporter: Alexander Kasyanenko > Priority: Major > > Hi, I'm trying to launch multiple jobs in Flink Session Cluster, deployed on > mesos. > Flink's version is 1.9.0. > The very first resource allocation completes successfully, and first > submitted job launches, but submitting any amount of jobs afterwords doesn't > affect the cluster in any way and no additional TaskManagers are allocated. > From the logs I see that MesosResourceManager is requesting Slots for the > newly submitted jobs: "{{o.a.f.m.r.c.MesosResourceManager - Request slot > with profile ResourceProfile..."}} but line {{"Starting a new worker.}}" > appears in log only the same amount of times as taskmanagers count, allocated > for the first job. > I'm a complete noob in flink internals, but took a wild guess about a reason. > I think that the problem is in this check: > [https://github.com/apache/flink/blob/release-1.9.0/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosResourceManager.java#L436] > It might be that RM is lazily allocated at the first call by a factory, and > then a private final field {{slotsPerWorker}} is set. So this check will > prevent creation of any new worker after iterator traverses the entire > collection. My main assumption is that {{slotsPerWorker}} is never modified > again. > > I'm sorry that I didn't do much of investigation before reporting, but I'll > try to do some after a weekend. I plan to build flink without this check and > see if it helps. Also I'll play around with tests for this RM. Since it's my > time running time flink internals, I'll be back after a few days. > Any help will much appreciated. > Thanks in advance. -- This message was sent by Atlassian Jira (v8.3.2#803003)