Piyush Narang created FLINK-14158:
-------------------------------------
Summary: Update Mesos configs to add leaseOfferExpiration and
declinedOfferRefuse durations
Key: FLINK-14158
URL: https://issues.apache.org/jira/browse/FLINK-14158
Project: Flink
Issue Type: Bug
Reporter: Piyush Narang
While debugging some Flink on Mesos scheduling issues (tied to our use of Mesos
quotas) we end up getting skewed offers that are useless fairly often. As we
are not rejecting these offers fast enough and as we are not telling Mesos to
not re-send for a long enough period, we end up not being able to schedule our
job for upwards of an hour (~30 Mesos containers).
The Fenzo default is to reject expired and unused Mesos offers after 120s, this
can be overridden using their TaskScheduler builder. Additionally, Mesos allows
us to override the time for which it won't re-send offers (default is 5s). We
found that updating to reject more aggressively (every 1s instead of 120s) and
keeping rejected offers away for longer (60s instead of 5s) dramatically
increases our chances of scheduling our jobs on Mesos.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)