Hi all,

As part of https://issues.apache.org/jira/browse/MESOS-2347, there is a
scalability concern with the reconciliation API. Performing an implicit
reconciliation results in a status update being sent for each task in the
cluster. For large clusters in the tens of thousands of slaves, this can be
begin to approach hundreds of thousands of status updates.

With the current design of the driver, status updates must be persisted
before the scheduler returns from the 'statusUpdate' callback, as the
driver sends an acknowledgement implicitly once the call completes. This
design forces the scheduler to synchronously process individual status
updates.

To remedy the issue, we're looking to introduce the ability to optionally
specify whether the implicit acknowledgements are provided (during
construction of the scheduler driver). If disabled, then the scheduler must
send acknowledgments through a new 'acknowledgeStatusUpdate' call on the
driver. Having explicit acknowledgements allows schedulers to process them
asynchronously outside of the driver thread, and allows them to process
updates in batch (e.g. 1:N storage operation:status updates).

As part of the change, the underlying UUID of the status update needs to be
exposed to the scheduler, which requires an update to the signature of
'statusUpdate'. What this means is that when schedulers include the new
headers/JAR/egg, they need to adjust their code to accept the new uuid
argument, regardless of whether implicit acknowledgements are desired (to
my knowledge, there is no way to expose the uuid without requiring
schedulers to update their code, because of Java's interface semantics).

I'd like to get this change landed for 0.22.0 to make reconciliation usable
for large clusters. The patches are up on MESOS-2347. I've outlined the
compatibility details and upgrade steps in
https://reviews.apache.org/r/30978/

Please share any high level feedback or concerns!

Ben

Reply via email to