Re: Proposal on a future architecture of OpenWhisk

Markus Thömmes Wed, 25 Jul 2018 06:05:38 -0700

Hi David,

the problem is, that if you have N controllers in the system and M
containers but N > M and all controllers manage their containers
exclusively, you'll end up with controllers not having a container to
manage at all.
There's valid, very slow workload that needs to create only 1
container, for example a slow cron trigger. Due to the round-robin
nature of our front-door, eventually all controllers will get one of
those requests at some point. Since they are by design not aware of
the containers because they are managed by another controller they'd
end-up asking for a newly created container. Given N controllers we'd
always create at least N containers for any action eventually. That is
wasteful.


Instead, requests are proxied to a controller which we know manages a
container for the given action (the ContainerManager knows that) and
thereby bypass the need to create too many containers. If the load is
too much to be handled by the M containers, the controllers managing
those M will request new containers, which will get distributed to all
controllers. Eventually, given enough load, all controllers will have
containers to manage for each action.

The ContainerManager only needs to know which controller has which
container. It does not need to know in which state these containers
are. If they are busy, the controller itself will request more
resources accordingly.

Hope that clarifies

Cheers,
Markus

2018-07-25 14:19 GMT+02:00 David Breitgand <davi...@il.ibm.com>:
> Hi Markus,
>
> I'd like to better understand the edge case.
>
> Citing from the wiki.
>
>>> Edge case: If an action only has a very small amount of containers
> (less than there are Controllers in the system), we have a problem with
> the method described above.
>
> Isn't there always at least one controller in the system? I think the
> problem is not the number of Controllers, but rather availability of
> prewarm containers that these Controllers control. If all containers of
> this Controller are busy at the moment, and concurrency level per
> container is 1 and the invocation hit this controller, it cannot execute
> the action immediately with one of its containers. Is that the problem
> that is being solved?
>
>>> Since the front-door schedules round-robin or least-connected, it's
> impossible to decide to which Controller the request needs to go to hit
> that has a container available.
> In this case, the other Controllers (who didn't get a container) act as a
> proxy and send the request to a Controller that actually has a container
> (maybe even via HTTP redirect). The ContainerManager decides which
> Controllers will act as a proxy in this case, since its the instance that
> distributes the containers.
>>>
>
> When reading your proposal, I was under impression that ContainerManager
> only knows about existence of containers allocated to the Controllers
> (because they asked), but ContainerManager does not know about the state
> of these containers at every given moment (i.e., whether they are being
> busy with running some action or not). I don't see Controllers updating
> ContainerManager about this in your diagrams.
>
> Thanks.
>
> -- david
>
>
>
>
> From:   "Markus Thoemmes" <markus.thoem...@de.ibm.com>
> To:     dev@openwhisk.apache.org
> Date:   23/07/2018 02:21 PM
> Subject:        Re: Proposal on a future architecture of OpenWhisk
>
>
>
> Hi Dominic,
>
> let's see if I can clarify the specific points one by one.
>
>>1. Docker daemon performance issue.
>>
>>...
>>
>>That's the reason why I initially thought that a Warmed state would
>>be kept
>>for more than today's behavior.
>>Today, containers would stay in the Warmed state only for 50ms, so it
>>introduces PAUSE/RESUME in case action comes with the interval of
>>more than
>>50 ms such as 1 sec.
>>This will lead to more loads on Docker daemon.
>
> You're right that the docker daemon's throughput is indeed an issue.
>
> Please note that PAUSE/RESUME are not executed via the docker daemon in
> performance
> tuned environment but rather done via runc, which does not have such a
> throughput
> issue because it's not a daemon at all. PAUSE/RESUME latencies are ~10ms
> for each
> operation.
>
> Further, the duration of the pauseGrace is not related to the overall
> architecture at
> all. Rather, it's a so narrow to safe-guard against users stealing cycles
> from the
> vendor's infrastructure. It's also a configurable value so you can tweak
> it as you
> want.
>
> The proposed architecture itself has no impact on the pauseGrace.
>
>>
>>And if the state of containers is changing like today, the state in
>>ContainerManager would be frequently changing as well.
>>This may induce a synchronization issue among controllers and, among
>>ContainerManagers(in case there would be more than one
>>ContainerManager).
>
> The ContainerManager will NOT be informed about pause/unpause state
> changes and it
> doesn't need to. I agree that such a behavior would generate serious load
> on the
> ContainerManager, but I think it's unnecessary.
>
>>2. Proxy case.
>>
>>...
>>
>>If it goes this way, ContainerManager should know all the status of
>>containers in all controllers to make a right decision and it's not
>>easy to
>>synchronize all the status of containers in controllers.
>>If it does not work like this, how can controller2 proxy requests to
>>controller1 without any information about controller1's status?
>
>
> The ContainerManager distributes a list of containers across all
> controllers.
> If it does not have enough containers at hand to give one to each
> controller,
> it instead tells controller2 to proxy to controller1, because the
> ContainerManager
> knows at distribution-time, that controller1 has such a container.
>
> No synchronization needed between controllers at all.
>
> If controller1 gets more requests than the single container can handle, it
> will
> request more containers, so eventually controller2 will get its own.
>
> Please refer to
> https://lists.apache.org/thread.html/84a7b8171b90719c2f7aab86bea48a7e7865874c4e54f082b0861380@%3Cdev.openwhisk.apache.org%3E
>
> for more information on that protocol.
>
>
>>3. Intervention among multiple actions
>>
>>If the concurrency limit is 1, and the container lifecycle is managed
>>like
>>today, intervention among multiple actions can happen again.
>>For example, the maximum number of containers which can be created by
>>a
>>user is 2, and ActionA and ActionB invocation requests come
>>alternatively,
>>controllers will try to remove and recreate containers again and
>>again.
>>I used an example with a small number of max container limit for
>>simplicity, but it can happen with a higher limit as well.
>>
>>And though concurrency limit is more than 1 such as 3, it also can
>>happen
>>if actions come more quickly than the execution time of actions.
>
> The controller will never try to delete a container at all, neither does
> it's
> pool of managed containers has a limit.
> If it doesn't have a container for ActionA it will request one from the
> ContainerManager.
> If it doesn't have one for ActionB it will request one from the
> ContainerManager.
>
> There will be 2 containers in the system and assuming that the
> ContainerManager has enough
> resources to keep those 2 containers alive, it will not delete them.
>
> The controllers by design cannot cause the behavior you're describing. The
> architecture is
> actually build around fixing this exact issue (eviction due to multiple
> heavy users in the
> system).
>
>>4. Is concurrency per container controlled by users in a per-action
>>based
>>way?
>>Let me clarify my question about concurrency limit.
>>
>>If concurrency per container limit is more than 1, there could be
>>multiple
>>actions being invoked at some point.
>>If the action requires high memory footprint such as 200MB or 150MB,
>>it can
>>crash if the sum of memory usage of concurrent actions exceeds the
>>container memory.
>>(In our case(here), some users are executing headless-chrome and
>>puppeteer
>>within actions, so it could happen under the similar situation.)
>>
>>So I initially thought concurrency per container is controlled by
>>users in
>>a per-action based way.
>>If concurrency per container is only configured by OW operators
>>statically,
>>some users may not be able to invoke their actions correctly in the
>>worst
>>case though operators increased the memory of the biggest container
>>type.
>>
>>And not only for this case, there could be some more reasons that
>>some
>>users just want to invoke their actions without per-container
>>concurrency
>>but the others want it for better throughput.
>>
>>So we may need some logic for users to take care of per-container
>>concurrency for each actions.
>
> Yes, the intention is to provide exactly what you're describing, maybe I
> worded it weirdly
> in my last response.
>
> This is not relevant for the architecture though.
>
>
>>5. Better to wait for the completion rather than creating a new
>>container.
>>According to the workload, it would be better to wait for the
>>previous
>>execution rather than creating a new container because it takes upto
>>500ms
>>~ 1s.
>>Even though the concurrency limit is more than 1, it still can happen
>>if
>>there is no logic to cumulate invocations and decide whether to
>>create a
>>new container or waiting for the existing container.
>
> The proposed asynchronous protocol between controller and ContainerManager
> accomplishes this by design:
>
> If a controller does not have the resources to execute the current
> request, it requests those resources.
> The ContainerManager updates resources asynchronously.
> The Controller will schedule the outstanding request as soon as it gets
> resources for it. It does not care
> if those resources are  becoming free because another request finished or
> because it got a fresh container
> from the ContainerManager. Requests will always be dispatched as soon as
> resources are free.
>
>>6. HA of ContainerManager.
>>Since it is mandatory to deploy the system without any downtime to
>>use it
>>for production, we need to support HA of ContainerManager.
>>It means the state of ContainerManager should be replicated among
>>replicas.
>>(No matter which method we use between master/slave or clustering.)
>>
>>If ContainerManager knows about the status of each container, it
>>would not
>>be easy to support HA with its eventual consistent nature.
>>If it does only know which containers are assigned to which
>>controllers, it
>>cannot handle the edge case as I mentioned above.
>
> I agree, HA is mandatory. Since the ContainerManager operates only on the
> container creation/deletion path,
> we can probably afford to persist its state into something like Redis. If
> it crashes, the slave instance
> can take over immediately without any eventual-consistency concerns or
> downtime.
>
> Also note that a downtime in the ContainerManager will ONLY cause an
> impact on the ability to create containers.
> Workloads that already have containers created will continue to work just
> fine.
>
>
> Does that answer/mitigate your concerns?
>
> Cheers,
> Markus
>
>
>>To: dev@openwhisk.apache.org
>>From: Dominic Kim <style9...@gmail.com>
>>Date: 07/23/2018 12:48PM
>>Subject: Re: Proposal on a future architecture of OpenWhisk
>>
>>Dear Markus.
>>
>>I may not correctly understand the direction of new architecture.
>>So let me describe my concerns in more details.
>>
>>Since that is a future architecture of OpenWhisk and requires many
>>breaking
>>changes, I think it should at least address all known issues.
>>So I focused on figuring out whether it handles all issues which are
>>reported in my proposal.
>>(
>>INVALID URI REMOVED
>>_confluence_display_OPENWHISK_Autonomous-2BContainer-2BScheduling&d=D
>>wIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hrbwAtsFbpjFv44gWxWuA_MH56HIaR3jKAHn
>>WL2Si9M&m=yYWiw1fuZyVCjEpmC49VlFoo29lr1Cq39Bcayz65phg&s=818SNwNuYXfpL
>>llgKfuK2DGMVrXBXKfE9Vbmf35IYI8&e=
>>)
>>
>>1. Docker daemon performance issue.
>>
>>The most critical issue is poor performance of docker daemon.
>>Since it is not inherently designed for high throughput or concurrent
>>processing, Docker daemon shows poor performance in comparison with
>>OW.
>>In OW(serverless) world, action execution can be finished within 5ms
>>~
>>10ms, but the Docker daemon shows 100 ~ 500ms latency.
>>Still, we can take advantage of Prewarm and Warmed containers, but
>>under
>>the situation where container creation/deletion/pausing/resuming
>>happen
>>frequently and the situation lasted for long-term, the requests are
>>delayed
>>and even the Docker daemon crashed.
>>So I think it is important to reduce the loads(requests) against the
>>Docker
>>daemon.
>>
>>That's the reason why I initially thought that a Warmed state would
>>be kept
>>for more than today's behavior.
>>Today, containers would stay in the Warmed state only for 50ms, so it
>>introduces PAUSE/RESUME in case action comes with the interval of
>>more than
>>50 ms such as 1 sec.
>>This will lead to more loads on Docker daemon.
>>
>>And if the state of containers is changing like today, the state in
>>ContainerManager would be frequently changing as well.
>>This may induce a synchronization issue among controllers and, among
>>ContainerManagers(in case there would be more than one
>>ContainerManager).
>>
>>So I think containers should be running for more than today's
>>pauseGrace
>>time.
>>With more than 1 concurrency limit per container, it would also be
>>better
>>to keep containers running(not paused) for more than 50ms.
>>
>>2. Proxy case.
>>
>>In the edge case where a container only exists in controller1, how
>>can
>>controller2 decide to proxy the request to controller1 rather than
>>just
>>creating its own container?
>>If it asks to ContainerManager, ContainerManager should know the
>>state of
>>the container in controller1.
>>If the container in controller1 is already busy, it would be better
>>to
>>create a new container in controller2 rather than proxying the
>>requests to
>>controller1.
>>
>>If it goes this way, ContainerManager should know all the status of
>>containers in all controllers to make a right decision and it's not
>>easy to
>>synchronize all the status of containers in controllers.
>>If it does not work like this, how can controller2 proxy requests to
>>controller1 without any information about controller1's status?
>>
>>3. Intervention among multiple actions
>>
>>If the concurrency limit is 1, and the container lifecycle is managed
>>like
>>today, intervention among multiple actions can happen again.
>>For example, the maximum number of containers which can be created by
>>a
>>user is 2, and ActionA and ActionB invocation requests come
>>alternatively,
>>controllers will try to remove and recreate containers again and
>>again.
>>I used an example with a small number of max container limit for
>>simplicity, but it can happen with a higher limit as well.
>>
>>And though concurrency limit is more than 1 such as 3, it also can
>>happen
>>if actions come more quickly than the execution time of actions.
>>
>>4. Is concurrency per container controlled by users in a per-action
>>based
>>way?
>>Let me clarify my question about concurrency limit.
>>
>>If concurrency per container limit is more than 1, there could be
>>multiple
>>actions being invoked at some point.
>>If the action requires high memory footprint such as 200MB or 150MB,
>>it can
>>crash if the sum of memory usage of concurrent actions exceeds the
>>container memory.
>>(In our case(here), some users are executing headless-chrome and
>>puppeteer
>>within actions, so it could happen under the similar situation.)
>>
>>So I initially thought concurrency per container is controlled by
>>users in
>>a per-action based way.
>>If concurrency per container is only configured by OW operators
>>statically,
>>some users may not be able to invoke their actions correctly in the
>>worst
>>case though operators increased the memory of the biggest container
>>type.
>>
>>And not only for this case, there could be some more reasons that
>>some
>>users just want to invoke their actions without per-container
>>concurrency
>>but the others want it for better throughput.
>>
>>So we may need some logic for users to take care of per-container
>>concurrency for each actions.
>>
>>5. Better to wait for the completion rather than creating a new
>>container.
>>According to the workload, it would be better to wait for the
>>previous
>>execution rather than creating a new container because it takes upto
>>500ms
>>~ 1s.
>>Even though the concurrency limit is more than 1, it still can happen
>>if
>>there is no logic to cumulate invocations and decide whether to
>>create a
>>new container or waiting for the existing container.
>>
>>
>>6. HA of ContainerManager.
>>Since it is mandatory to deploy the system without any downtime to
>>use it
>>for production, we need to support HA of ContainerManager.
>>It means the state of ContainerManager should be replicated among
>>replicas.
>>(No matter which method we use between master/slave or clustering.)
>>
>>If ContainerManager knows about the status of each container, it
>>would not
>>be easy to support HA with its eventual consistent nature.
>>If it does only know which containers are assigned to which
>>controllers, it
>>cannot handle the edge case as I mentioned above.
>>
>>
>>
>>Since many parts of the architecture are not addressed yet, I think
>>it
>>would be better to separate each parts and discuss further deeply.
>>But in the big picture, I think we need to figure out whether it can
>>handle
>>or at least alleviate all known issues or not first.
>>
>>
>>Best regards,
>>Dominic
>>
>>
>>2018-07-21 1:36 GMT+09:00 David P Grove <gro...@us.ibm.com>:
>>
>>>
>>>
>>> Tyson Norris <tnor...@adobe.com.INVALID> wrote on 07/20/2018
>>12:24:07 PM:
>>> >
>>> > On Logging, I think if you are considering enabling concurrent
>>> > activation processing, you will encounter that the only approach
>>to
>>> > parsing logs to be associated with a specific activationId, is to
>>> > force the log output to be structured, and always include the
>>> > activationId with every log message. This requires a change at
>>the
>>> > action container layer, but the simpler thing to do is to
>>encourage
>>> > action containers to provide a structured logging context that
>>> > action developers can (and must) use to generate logs.
>>>
>>> Good point.  I agree that if there is concurrent activation
>>processing in
>>> the container, structured logging is the only sensible thing to do.
>>>
>>>
>>> --dave
>>>
>>
>
>
>
>
>
>

Re: Proposal on a future architecture of OpenWhisk

Reply via email to