Regarding the backwards-compatibility concern, would it make sense to add a TaskStatusID field to the existing TaskStatus message instead of changing the Scheduler signature?
On Friday, February 13, 2015, Benjamin Mahler <[email protected]> wrote: > Hi all, > > As part of https://issues.apache.org/jira/browse/MESOS-2347, there is a > scalability concern with the reconciliation API. Performing an implicit > reconciliation results in a status update being sent for each task in the > cluster. For large clusters in the tens of thousands of slaves, this can be > begin to approach hundreds of thousands of status updates. > > With the current design of the driver, status updates must be persisted > before the scheduler returns from the 'statusUpdate' callback, as the > driver sends an acknowledgement implicitly once the call completes. This > design forces the scheduler to synchronously process individual status > updates. > > To remedy the issue, we're looking to introduce the ability to optionally > specify whether the implicit acknowledgements are provided (during > construction of the scheduler driver). If disabled, then the scheduler must > send acknowledgments through a new 'acknowledgeStatusUpdate' call on the > driver. Having explicit acknowledgements allows schedulers to process them > asynchronously outside of the driver thread, and allows them to process > updates in batch (e.g. 1:N storage operation:status updates). > > As part of the change, the underlying UUID of the status update needs to be > exposed to the scheduler, which requires an update to the signature of > 'statusUpdate'. What this means is that when schedulers include the new > headers/JAR/egg, they need to adjust their code to accept the new uuid > argument, regardless of whether implicit acknowledgements are desired (to > my knowledge, there is no way to expose the uuid without requiring > schedulers to update their code, because of Java's interface semantics). > > I'd like to get this change landed for 0.22.0 to make reconciliation usable > for large clusters. The patches are up on MESOS-2347. I've outlined the > compatibility details and upgrade steps in > https://reviews.apache.org/r/30978/ > > Please share any high level feedback or concerns! > > Ben > -- Sent from Gmail Mobile
