The timeout behavior sounds like a dangerous scalability tripwire. Consider revisiting that approach.
On Sun, Oct 28, 2018 at 10:42 PM Varun Gupta <var...@uber.com.invalid> wrote: > Mesos Version: 1.6 > > scheduler has 250k events in its queue: Master master sends status updates > to scheduler, and scheduler stores them in the queue. Scheduler process in > FIFO, and once processed (includes persisting to DB) it ack the update. > These updates are processed asynchronously with a thread pool of 1000 size. > We are using explicit reconciliation. > If the ack to Mesos Master is timing out, due to high CPU usage then next > ack will likely fail too. It slows down processing on Scheduler side, > meanwhile Mesos Master continuous to send status updates (duplicate ones, > since old status updates are not ack). This leads to building up of status > updates at Scheduler to be processed, and we have seen it to grow upto 250k > status updates. > > Timeout is the explicit ack request from Scheduler to Mesos Master. > > Mesos Master profiling: Next time, when this issue occurs, I will take the > dump. > > Deduplication is for the status updates present in the queue for scheduler > to process, idea is to dedup duplicate status updates such that scheduler > only processes same status update pending in queue once, and ack to Mesos > Master also ones. It will reduce the load for both Scheduler and Mesos > Master. After the ack (success/fail) scheduler will remove the status > update from the queue, and in case of failure, Mesos Master will send > status update again. > > > > On Sun, Oct 28, 2018 at 10:15 PM Benjamin Mahler <bmah...@apache.org> > wrote: > > > Which version of mesos are you running? > > > > > In framework, event updates grow up to 250k > > > > What does this mean? The scheduler has 250k events in its queue? > > > > > which leads to cascading effect on higher latency at Mesos Master (ack > > requests with 10s timeout) > > > > Can you send us perf stacks of the master during such a time window so > > that we can see if there are any bottlenecks? > > http://mesos.apache.org/documentation/latest/performance-profiling/ > > > > Where is this timeout coming from and how is it used? > > > > > simultaneously explore if dedup is an option > > > > I don't know what you're referring to in terms of de-duplication. Can you > > explain how the scheduler's status update processing works? Does it use > > explicit acknowledgements and process batches asynchronously? Aurora > > example: https://reviews.apache.org/r/33689/ > > > > On Sun, Oct 28, 2018 at 8:58 PM Varun Gupta <var...@uber.com.invalid> > > wrote: > > > >> Hi Benjamin, > >> > >> In our batch workload use case, number of tasks churn is pretty high. We > >> have seen 20-30k tasks launch within 10 second window and 100k+ tasks > >> running. > >> > >> In framework, event updates grow up to 250k, which leads to cascading > >> effect on higher latency at Mesos Master (ack requests with 10s timeout) > >> as > >> well as blocking framework to process new since there are too many left > to > >> be acknowledged. > >> > >> Reconciliation is every 30 mins which also adds pressure on event stream > >> if > >> too many unacknowledged. > >> > >> I am thinking to experiment with default backoff period from 10s -> 30s > or > >> 60s, and simultaneously explore if dedup is an option. > >> > >> Thanks, > >> Varun > >> > >> On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler <bmah...@apache.org> > >> wrote: > >> > >> > Hi Varun, > >> > > >> > What problem are you trying to solve precisely? There seems to be an > >> > implication that the duplicate acknowledgements are expensive. They > >> should > >> > be low cost, so that's rather surprising. Do you have any data related > >> to > >> > this? > >> > > >> > You can also tune the backoff rate on the agents, if the defaults are > >> too > >> > noisy in your setup. > >> > > >> > Ben > >> > > >> > On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta <var...@uber.com> wrote: > >> > > >> > > > >> > > Hi, > >> > >> > >> > >> Mesos agent will send status updates with exponential backoff until > >> ack > >> > >> is received. > >> > >> > >> > >> If processing of events at framework and sending ack to Master is > >> > running > >> > >> slow then it builds a back pressure at framework due to duplicate > >> > updates > >> > >> for same status. > >> > >> > >> > >> Has someone explored the option to dedup same status update event > at > >> > >> framework or is it even advisable to do. End goal is to dedup all > >> events > >> > >> and send only one ack back to Master. > >> > >> > >> > >> Thanks, > >> > >> Varun > >> > >> > >> > >> > >> > >> > >> > > >> > > >