More on the queueing proposal: One problem with a single unified overflow topic is the case of multiple controllers WITH shared data, where: - all controllers are in overflow state - activation A1 arrives at controller C0, timeout scheduled, message sent to overflow topic - controller C1 gets a view on invokers with capacity BEFORE controller C0 does - begins processing overflow topic - activation A1 is now being processed by C1 (but the request and initial timeout is being processed on controller C0)
For this case, I am thinking that it should be possible to use a different controller to process the activation than the controller that originally received the request/activation(!). To do this: - the timeout scheduled for for the overflow-processing controller, would consider time already spent waiting for invokers. (A failed completion messages would be sent to the initial controllers completion topic in case of timeout) - when activation is scheduled to an invoker, the original controller (if different) must be noted in the LoadBalancerData entry - successful completion messages from invoker would be sent to the processing controller's completion topic (and THEN be sent to the initial controllers completion topic), since both controllers are waiting for timeout of that activation processing in invoker For the case of multiple controllers WITHOUT shared data: I think the same approach works, so overflow activations would be processed asap by any available controller + any available invoker, as opposed to current controller waiting for its own in-flight activations to complete before overflow is processed. (Only difference is that invokers will be overscheduled due to inability of controllers to view other controllers’ scheduled activations). I’ll try to come up with a diagram to describe this but wanted to mention it to see if people have feedback on the idea in the meantime. Thanks Tyson > On Oct 10, 2017, at 10:34 AM, Markus Thömmes <markusthoem...@me.com> wrote: > > Heyho, > > I ran into the same issue before and I think our scheduling code should be an > Actor. We could microbenchmark it to assure it can happily schedule a large > amount of actions per second to not become a bottleneck. > > +1 for actorizing the LB > > Cheers, > Markus > > Von meinem iPhone gesendet > >> Am 10.10.2017 um 13:28 schrieb Tyson Norris <tnor...@adobe.com.INVALID>: >> >> Hi - >> Following up on this, I’ve been working on a PR. >> >> One issue I’ve run into (which may be problematic in other scheduling >> scenarios) is that the scheduling in LoadBalancerService doesn’t respect the >> new async nature of activation counting in LoadBalancerData. At least I >> think this is a good description. >> >> Specifically, I am creating a test that submits activations via >> LoadBalancer.publish, and I end up with 10 activations scheduled on >> invoker0, even though I use an invokerBusyThreshold of 1. >> It would only occur when concurrent requests (or very short time between?) >> arrive at the same controller, I think. (Otherwise the counts can sync up >> quickly enough) >> I’ll work more on testing it. >> >> Assuming this (dealing with async counters) is the problem, I’m not exactly >> sure how to deal with it. Some options may include: >> - change LoadBalancer to an actor, so that local counter states can be >> easier managed (these would still need to replicate, but at least locally it >> would do the right thing) >> - coordinate the schedule + setupActivation calls to also rely on some local >> state for activations that should be counted but have not yet been processed >> within LoadBalancerData >> >> Any suggestions in this area would be great. >> >> Thanks >> Tyson >> >> >> >>> On Oct 6, 2017, at 11:04 AM, Tyson Norris <tnor...@adobe.com.INVALID> wrote: >>> >>> With many invokers, there is less data exposed to rebalancing operations, >>> since the invoker topics will only ever receive enough activations that can >>> be processed “immediately", currently set to 16. The single backlog topic >>> would only be consumed by the controller (not any invoker), and the >>> invokers would only consumer their respective “process immediately” topic - >>> which effectively has no, or very little, backlog - 16 max. My suggestion >>> is that having multiple backlogs is an unnecessary problem, regardless of >>> how many invokers there are. >>> >>> It is worth noting the case of multiple controllers as well, where multiple >>> controllers may be processing the same backlog topic. I don’t think this >>> should cause any more trouble than the distributed activation counting that >>> should be enabled via controller clustering, but it may mean that if one >>> controller enters overflow state, it should signal that ALL controllers are >>> now in overflow state, etc. >>> >>> Regarding “timeout”, I would plan to use the existing timeout mechanism, >>> where an ActivationEntry is created immediately, regardless of whether the >>> activation is going to get processed, or get added to the backlog. At time >>> of processing the backlog message, if the entry is timed out, throw it >>> away. (The entry map may need to be shared in the case multiple invokers >>> are in use, and they all consume from the same topic; alternatively, we can >>> partition the topic so that entries are only processed by the controller >>> that has backlogged them) >>> >>> Yes, once invokers are saturated, and backlogging begins, I think all >>> incoming activations should be sent straight to backlog (we already know >>> that no invokers are available). This should not hurt overall performance >>> anymore than it currently does, and should be better (since the first >>> invoker available can start taking work, instead of waiting on a specific >>> invoker to become available). >>> >>> I’m working on a PR, I think much of these details will come out there, but >>> in the meantime, let me know if any of this doesn’t make sense. >>> >>> Thanks >>> Tyson >>> >>> >>> On Oct 5, 2017, at 2:49 PM, David P Grove >>> <gro...@us.ibm.com<mailto:gro...@us.ibm.com>> wrote: >>> >>> >>> I can see the value in delaying the binding of activations to invokers when >>> the system is loaded (can't execute "immediately" on its target invoker). >>> >>> Perhaps in ignorance, I am a little worried about the scalability of a >>> single backlog topic. With a few hundred invokers, it seems like we'd be >>> exposed to frequent and expensive partition rebalancing operations as >>> invokers crash/restart. Maybe if we have N = K*M invokers, we can get away >>> with M backlog topics each being read by K invokers. We could still get >>> imbalance across the different backlog topics, but it might be good enough. >>> >>> I think we'd also need to do some thinking of how to ensure that work put >>> in a backlog topic doesn't languish there for a really long time. Once we >>> start having work in the backlog, do we need to stop putting work in >>> immediately topics? If we do, that could hurt overall performance. If we >>> don't, how will the backlog topic ever get drained if most invokers are >>> kept busy servicing their immediately topics? >>> >>> --dave >>> >>> Tyson Norris ---10/04/2017 07:45:38 PM---Hi - I’ve been discussing a bit >>> with a few about optimizing the queueing that goes on ahead of invok >>> >>> From: Tyson Norris >>> <tnor...@adobe.com.INVALID<mailto:tnor...@adobe.com.INVALID>> >>> To: "dev@openwhisk.apache.org<mailto:dev@openwhisk.apache.org>" >>> <dev@openwhisk.apache.org<mailto:dev@openwhisk.apache.org>> >>> Date: 10/04/2017 07:45 PM >>> Subject: Invoker activation queueing proposal >>> >>> ________________________________ >>> >>> >>> >>> Hi - >>> >>> I’ve been discussing a bit with a few about optimizing the queueing that >>> goes on ahead of invokers so that things behave more simply and predictable. >>> >>> >>> >>> In short: Instead of scheduling activations to an invoker on receipt, do >>> the following: >>> >>> - execute the activation "immediately" if capacity is available >>> >>> - provide a single overflow topic for activations that cannot execute >>> “immediately" >>> >>> - schedule from the overflow topic when capacity is available >>> >>> >>> >>> (BTW “Immediately” means: still queued via existing invoker topics, but >>> ONLY gets queued there in the case that the invoker is not fully loaded, >>> and therefore should execute it “very soon") >>> >>> >>> >>> Later: it would also be good to provide more container state data from >>> invoker to controller, to get better scheduling options - e.g. if some >>> invokers can handle running more containers than other invokers, that info >>> can be used to avoid over/under-loading the invokers (currently we assume >>> each invoker can handle 16 activations, I think) >>> >>> >>> >>> I put a wiki page proposal here: >>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange%26d%3DDwIGaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DFe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk%26m%3DUE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM%26s%3DLD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8%26e&data=02%7C01%7C%7C206a28eb8f9d4f3b131608d50ce4be0f%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636429098932393323&sdata=pLsSnlJRYtL4cHMqciGBsA9kLaHzW1GjbijpJCsD1po%3D&reserved=0=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_OPENWHISK_Invoker-2BActivation-2BQueueing-2BChange%26d%3DDwIGaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DFe4FicGBU_20P2yihxV-apaNSFb6BSj6AlkptSF2gMk%26m%3DUE8OIR_GnMltmRZyIuLVHMlzyQvNku-H7kLk67u45IM%26s%3DLD75-npfzA7qzUGNgYbFBy4qKatnkdO5I2vKYSGUBg8%26e%3D&data=02%7C01%7C%7C36a3439a232c45d2119c08d50c3b096b%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636428370067764903&sdata=MBzAhIAVOdHCG0acu8YKCNmeYXO8T9PcILoQrlUyixw%3D&reserved=0> >>> >>> >>> >>> WDYT? >>> >>> >>> >>> Thanks >>> >>> Tyson >>> >>