Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Rodric Rabbah
> A second idea that comes to my mind would be to implement a shared
counter (it can easily be eventually consistent, consistency is I think not
a concern here).

This is simply a drive-by comment, as I have not directly weighed in on the
rest of the discussion. But this comment about a shared counter reminded of
an recent interview with Tim Wagner
https://read.acloud.guru/serverless-and-blockchain-an-interview-with-tim-wagner-from-aws-to-coinbase-f3b2b5939790
about SQS and Lambda integration:

   So in order to give customers — and ourselves, frankly — some
control over that, we had to go invent an entire new feature, concurrency
controls per function in Lambda, which also meant we had to have metrics
and internal infrastructure for that.

   That required us to change some of our architecture to produce what
we call the high-speed counting service, and and so on and so forth.
There’s a whole lot of iceberg below the waterline for the piece that comes
poking above the top.

-r


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Markus Thömmes
Hi,

Am Fr., 24. Aug. 2018 um 00:07 Uhr schrieb Tyson Norris
:

> > Router is not pulling at queue for "specific actions", just for any
> action
> > that might replace idle containers - right? This is complicated with
> > concurrency though since while a container is not idle (paused +
> > removable), it may be useable, but only if the action received is
> the same
> > as one existing warm container, and that container has concurrency
> slots
> > available for additional activations. It may be helpful to diagram
> some of
> > this stealing queue flow a bit more, I'm not seeing how it will work
> out
> > other than creating more containers than is absolutely required,
> which may
> > be ok, not sure.
> >
>
> Yes, I will diagram things out soonish, I'm a little bit narrow on time
> currently.
>
> The idea is that indeed the Router pulls for *specific* actions. This
> is a
> problem when using Kafka, but might be solvable when we don't require
> Kafka. I have to test this for feasibility though.
>
>
> Hmm OK - it's not clear how a router that is empty (not servicing any
> activations) becomes a router that is pulling for that specific action,
> when other routers pulling for that action are at capacity (so new
> containers are needed)
>

Disclaimer: All of this is very much in the idea and not part of the
original proposal.

That's where the second part of the "idea" that I had above comes in: If we
can somehow detect, that nobody is pulling (all are at capacity or have no
container), that is then the moment where we need to create new containers.

I proposed further above in the discussion that the Routers do *not* ask
for more Containers but rather that signal of the MQ (hey, nobody is having
capacity for that action apparently... so let's create capacity) could be
used to create Containers.

That's a very blunt assumption though, I don't know if that's feasible at
all. There might be a performant way of implementing just that signal in a
distributed way though (the signal being: We are out of capacity over the
whole cluster, we need more). A second idea that comes to my mind would be
to implement a shared counter (it can easily be eventually consistent,
consistency is I think not a concern here). Once that counter drops to 0 or
below 0, we need more Containers.

This is I think were the prototyping thread comes in: Personally, I don't
feel comfortable stating that any of the approaches outlined by me really
do work without having tried it out.


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Tyson Norris
> Router is not pulling at queue for "specific actions", just for any action
> that might replace idle containers - right? This is complicated with
> concurrency though since while a container is not idle (paused +
> removable), it may be useable, but only if the action received is the same
> as one existing warm container, and that container has concurrency slots
> available for additional activations. It may be helpful to diagram some of
> this stealing queue flow a bit more, I'm not seeing how it will work out
> other than creating more containers than is absolutely required, which may
> be ok, not sure.
>

Yes, I will diagram things out soonish, I'm a little bit narrow on time
currently.

The idea is that indeed the Router pulls for *specific* actions. This is a
problem when using Kafka, but might be solvable when we don't require
Kafka. I have to test this for feasibility though.


Hmm OK - it's not clear how a router that is empty (not servicing any 
activations) becomes a router that is pulling for that specific action, when 
other routers pulling for that action are at capacity (so new containers are 
needed)




Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Markus Thömmes
Hi Dave,

I agree! I'll start another thread on a discussion of how/where we
prototype things to hash out some of the unknowns.

Cheers,
Markus

Am Do., 23. Aug. 2018 um 22:05 Uhr schrieb David P Grove :

>
> Related to the random vs. smart routing discussion.
>
> A key unknown that influences the design is how much load we can drive
> through a single ContainerRouter.
> + If they are highly scalable (500 to 1000 containers per router),
> then even a fairly large OpenWhisk deployment could be running with a
> handful of ContainerRouters (and smart routing is quite viable).
> + If they are less scalable (10s to 100 containers per router) then
> large deployments will be running with 50+ ContinaerRouters and smart
> routing degrades to random routing in terms of container reuse for our long
> tail workloads.
>
> --dave
>


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Markus Thömmes
Hi Tyson,

Am Do., 23. Aug. 2018 um 21:28 Uhr schrieb Tyson Norris
:

> >
> > And each ContainerRouter has a queue consumer that presumably pulls
> from
> > the queue constantly? Or is consumption based on something else? If
> all
> > ContainerRouters are consuming at the same rate, then while this does
> > distribute the load across ContainerRouters, it doesn't really
> guarantee
> > any similar state (number of containers, active connections, etc) at
> each
> > ContainerRouter, I think. Maybe I am missing something here?
> >
>
>
> The idea is that ContainerRouters do **not** pull from the queue
> constantly. They pull work for actions that they have idle containers
> for.
>
> Router is not pulling at queue for "specific actions", just for any action
> that might replace idle containers - right? This is complicated with
> concurrency though since while a container is not idle (paused +
> removable), it may be useable, but only if the action received is the same
> as one existing warm container, and that container has concurrency slots
> available for additional activations. It may be helpful to diagram some of
> this stealing queue flow a bit more, I'm not seeing how it will work out
> other than creating more containers than is absolutely required, which may
> be ok, not sure.
>

Yes, I will diagram things out soonish, I'm a little bit narrow on time
currently.

The idea is that indeed the Router pulls for *specific* actions. This is a
problem when using Kafka, but might be solvable when we don't require
Kafka. I have to test this for feasibility though.


>
> Similar state in terms of number of containers is done via the
> ContainerManager. Active connections should roughly even out with the
> queue
> being pulled on idle.
>
> Yeah carefully defining "idle" may be tricky, if we want to achieve
> absolute minimum containers in use for a specific action at any time.
>
>
> >
> > The edge-case here is for very slow load. It's minimizing the
> amount of
> > Containers needed. Another example:
> > Say you have 3 Routers. A request for action X comes in, goes to
> > Router1.
> > It requests a container, puts the work on the queue, nobody
> steals it,
> > as
> > soon as the Container gets ready, the work is taken from the
> queue and
> > executed. All nice and dandy.
> >
> > Important remark: The Router that requested more Containers is
> not
> > necessarily the one that's getting the Containers. We need to
> make
> > sure to
> > evenly distribute Containers across the system.
> >
> > So back to our example: What happens if requests for action X
> are made
> > one
> > after the other? Well, the layer above the Routers (something
> needs to
> > loadbalance them, be it DNS or another type of routing layer)
> isn't
> > aware
> > of the locality of the Container that we created to execute
> action X.
> > As it
> > schedules fairly randomly (round-robin in a multi-tenant system
> is
> > essentially random) the action will hit each Router once very
> soon. As
> > we're only generating one request after the other, arguably we
> only
> > want to
> > create only one container.
> >
> > That's why in this example the 2 remaining Routers with no
> container
> > get a
> > reference to Router1.
> >
> > In the case you mentioned:
> > > it seems like sending to another Router which has the
> container, but
> > may
> > not be able to use it immediately, may cause failures in some
> cases.
> >
> > I don't recall if it's in the document or in the discussion on
> the
> > dev-list: The router would respond to the proxied request with a
> 503
> > immediatly. That tells the proxying router: Oh, apparently we
> need more
> > resources. So it requests another container etc etc.
> >
> > Does that clarify that specific edge-case?
> >
> > Yes, but I would not call this an edge-case -  I think it is more of
> a
> > ramp up to maximum container reuse, and will probably dramatically
> impacted
> > by containers that do NOT support concurrency (will get a 503 when a
> single
> > activation is in flight, vs high concurrency container, which would
> cause
> > 503 only once max concurrency reached).
> > If each ContainerRouter is as likely to receive the original
> request, and
> > each is also as likely to receive the queued item from the stealing
> queue,
> > then there will be a lot of cross traffic during the ramp up from 1
> > container to  containers. E.g.
> >
> > From client:
> > Request1 -> Router 1 -> queue (no containers)
> > Request2 -> Router 2 -> queue (no containers)
> > Request3 -> Router 3 -> queue (no containers)
> > From queue:
> > 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread David P Grove

Related to the random vs. smart routing discussion.

A key unknown that influences the design is how much load we can drive
through a single ContainerRouter.
+ If they are highly scalable (500 to 1000 containers per router),
then even a fairly large OpenWhisk deployment could be running with a
handful of ContainerRouters (and smart routing is quite viable).
+ If they are less scalable (10s to 100 containers per router) then
large deployments will be running with 50+ ContinaerRouters and smart
routing degrades to random routing in terms of container reuse for our long
tail workloads.

--dave


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Tyson Norris
>
> And each ContainerRouter has a queue consumer that presumably pulls from
> the queue constantly? Or is consumption based on something else? If all
> ContainerRouters are consuming at the same rate, then while this does
> distribute the load across ContainerRouters, it doesn't really guarantee
> any similar state (number of containers, active connections, etc) at each
> ContainerRouter, I think. Maybe I am missing something here?
>


The idea is that ContainerRouters do **not** pull from the queue
constantly. They pull work for actions that they have idle containers for.

Router is not pulling at queue for "specific actions", just for any action that 
might replace idle containers - right? This is complicated with concurrency 
though since while a container is not idle (paused + removable), it may be 
useable, but only if the action received is the same as one existing warm 
container, and that container has concurrency slots available for additional 
activations. It may be helpful to diagram some of this stealing queue flow a 
bit more, I'm not seeing how it will work out other than creating more 
containers than is absolutely required, which may be ok, not sure. 

Similar state in terms of number of containers is done via the
ContainerManager. Active connections should roughly even out with the queue
being pulled on idle.

Yeah carefully defining "idle" may be tricky, if we want to achieve absolute 
minimum containers in use for a specific action at any time.


>
> The edge-case here is for very slow load. It's minimizing the amount 
of
> Containers needed. Another example:
> Say you have 3 Routers. A request for action X comes in, goes to
> Router1.
> It requests a container, puts the work on the queue, nobody steals it,
> as
> soon as the Container gets ready, the work is taken from the queue and
> executed. All nice and dandy.
>
> Important remark: The Router that requested more Containers is not
> necessarily the one that's getting the Containers. We need to make
> sure to
> evenly distribute Containers across the system.
>
> So back to our example: What happens if requests for action X are made
> one
> after the other? Well, the layer above the Routers (something needs to
> loadbalance them, be it DNS or another type of routing layer) isn't
> aware
> of the locality of the Container that we created to execute action X.
> As it
> schedules fairly randomly (round-robin in a multi-tenant system is
> essentially random) the action will hit each Router once very soon. As
> we're only generating one request after the other, arguably we only
> want to
> create only one container.
>
> That's why in this example the 2 remaining Routers with no container
> get a
> reference to Router1.
>
> In the case you mentioned:
> > it seems like sending to another Router which has the container, but
> may
> not be able to use it immediately, may cause failures in some cases.
>
> I don't recall if it's in the document or in the discussion on the
> dev-list: The router would respond to the proxied request with a 503
> immediatly. That tells the proxying router: Oh, apparently we need 
more
> resources. So it requests another container etc etc.
>
> Does that clarify that specific edge-case?
>
> Yes, but I would not call this an edge-case -  I think it is more of a
> ramp up to maximum container reuse, and will probably dramatically 
impacted
> by containers that do NOT support concurrency (will get a 503 when a 
single
> activation is in flight, vs high concurrency container, which would cause
> 503 only once max concurrency reached).
> If each ContainerRouter is as likely to receive the original request, and
> each is also as likely to receive the queued item from the stealing queue,
> then there will be a lot of cross traffic during the ramp up from 1
> container to  containers. E.g.
>
> From client:
> Request1 -> Router 1 -> queue (no containers)
> Request2 -> Router 2 -> queue (no containers)
> Request3 -> Router 3 -> queue (no containers)
> From queue:
> Request1 -> Router1  -> create and use container
> Reuqest2 -> Router2 -> Router1 -> 503 -> create container
> Request3 -> Router3 -> Router1 -> 503 -> Router2 -> 503 -> create 
container
>
> In other words - the 503 may help when there is one container existing,
> and it is deemed to be busy, but what if there are 10 containers existing
> (on different Routers other than where the request was pulled from the
> stealing queue) - do you make HTTP requests to all 10 Routers to see if
> they are busy before creating a new 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread David P Grove

Great discussion; I'm not entirely convinced on part of this point though.


> We need a work-stealing queue here to dynamically rebalance between the
> Routers since the layer above the Routers has no idea about capacity and
> (at least that's my assumption) schedules randomly.

I agree we can't really keep track of actual current capacity outside of
the individual Router.  But I don't want to jump immediately from that to
assuming truly random scheduling at the layer above because it pushes a
pretty key problem down into the ContainerManager/ContainerRouter layer
(dealing with the "edge case" of finding hot containers for the very long
tail of actions that can be serviced by a very small number of running
containers).

The layer above could route based on runtime kind to increase the
probability of container reuse.

The layer above could still do some hash-based scheme to map an initial
"home" Router (or subset of Routers on a very large deployment) and rely on
work-stealing/overflow queue to deal with "noisy neighbor" hash collisions
if a Router gets badly overloaded.

Each Router is potentially managing a fairly large pool of containers.  The
pools don't have to be the same size between Routers.  More crazily, the
Routers could even autoscale themselves to deal with uneven load (in
effect, hierarchical routing).

Lots of half-baked ideas are possible here :)

--dave


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-23 Thread Markus Thömmes
Hi Tyson,

Am Do., 23. Aug. 2018 um 00:33 Uhr schrieb Tyson Norris
:

> Hi - thanks for the discussion! More inline...
>
> On 8/22/18, 2:55 PM, "Markus Thömmes"  wrote:
>
> Hi Tyson,
>
> Am Mi., 22. Aug. 2018 um 23:37 Uhr schrieb Tyson Norris
> :
>
> > Hi -
> > >
> > > When exactly is the case that a ContainerRouter should put a
> blocking
> > > activation to a queue for stealing? Since a) it is not spawning
> > containers
> > > and b) it is not parsing request/response bodies, can we say
> this
> > would
> > > only happen when a ContainerRouter maxes out its incoming
> request
> > handling?
> > >
> >
> > That's exactly the idea! The work-stealing queue will only be
> used if
> > the
> > Router where to request landed cannot serve the demand right
> now. For
> > example, if it maxed out the slots it has for a certain action
> (all
> > containers are working to their full extent) it requests more
> > resources and
> > puts the request-token on the work-stealing queue.
> >
> > So to clarify, ContainerRouter "load" (which can trigger use of
> queue) is
> > mostly (only?) based on:
> > * the number of Container references
> > * the number of outstanding inbound  HTTP requests, e.g. when lots of
> > requests can be routed to the same container
> > * the number of outstand outbound HTTP requests to remote action
> > containers (assume all are remote)
> > It is unclear the order of magnitude considered for "maxed out
> slots",
> > since container refs should be simple (like ip+port, action metadata,
> > activation count, and warm state), inbound connection handling is
> basically
> > a http server, and outbound is a connection pool per action container
> > (let's presume connection reuse for the moment).
> > I think it will certainly need testing to determine these and to be
> > configurable in any case, for each of these separate stats.. Is there
> > anything else that affects the load for ContainerRouter?
> >
>
> "Overload" is determined by the availability of free slots on any
> container
> being able to serve the current action invocation (or rather the
> absence
> thereof). An example:
> Say RouterA has 2 containers for action X. Each container has an
> allowed
> concurrency of 10. On each of those 2 there are 10 active invocations
> already running (the ContainerRouter knows this, these are open
> connections
> to the containers). If another request comes in for X, we know we don't
> have capacity for it. We request more resources and offer the work we
> got
> for stealing.
>
> I don't think there are tweaks needed here. The Router keeps an
> "activeInvocations" number per container and compares that to the
> allowed
> concurrency on that container. If activeInvocations ==
> allowedConcurrency
> we're out of capacity and need more.
>
> We need a work-stealing queue here to dynamically rebalance between the
> Routers since the layer above the Routers has no idea about capacity
> and
> (at least that's my assumption) schedules randomly.
>
> I think it is confusing to say that the ContainerRouter doesn't have
> capacity for it - rather, the existing set of continers in the
> ContainerRouter don't have capacity for it. I understand now, in any case.
>

Noted, will adjust future wording on this, thanks!


> So there are a couple of active paths in ContainerRouter, still only
> considering sync/blocking activations:
> * warmpath - run immediately
> * coldpath - send to queue
>
> And each ContainerRouter has a queue consumer that presumably pulls from
> the queue constantly? Or is consumption based on something else? If all
> ContainerRouters are consuming at the same rate, then while this does
> distribute the load across ContainerRouters, it doesn't really guarantee
> any similar state (number of containers, active connections, etc) at each
> ContainerRouter, I think. Maybe I am missing something here?
>


The idea is that ContainerRouters do **not** pull from the queue
constantly. They pull work for actions that they have idle containers for.

Similar state in terms of number of containers is done via the
ContainerManager. Active connections should roughly even out with the queue
being pulled on idle.


>
>

>
>
> >
> > That request-token will then be taken by any Router that has free
> > capacity
> > for that action (note: this is not simple with kafka, but might
> be
> > simpler
> > with other MQ technologies). Since new resources have been
> requested,
> > it is
> > guaranteed that one Router will eventually become free.
> >
> > Is "requests resources" here requesting new action containers, which
> it
> > won't be able to process itself immediately, but should startup +
> warm and
> > be 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Tyson Norris
Hi - thanks for the discussion! More inline...

On 8/22/18, 2:55 PM, "Markus Thömmes"  wrote:

Hi Tyson,

Am Mi., 22. Aug. 2018 um 23:37 Uhr schrieb Tyson Norris
:

> Hi -
> >
> > When exactly is the case that a ContainerRouter should put a 
blocking
> > activation to a queue for stealing? Since a) it is not spawning
> containers
> > and b) it is not parsing request/response bodies, can we say this
> would
> > only happen when a ContainerRouter maxes out its incoming request
> handling?
> >
>
> That's exactly the idea! The work-stealing queue will only be used if
> the
> Router where to request landed cannot serve the demand right now. For
> example, if it maxed out the slots it has for a certain action (all
> containers are working to their full extent) it requests more
> resources and
> puts the request-token on the work-stealing queue.
>
> So to clarify, ContainerRouter "load" (which can trigger use of queue) is
> mostly (only?) based on:
> * the number of Container references
> * the number of outstanding inbound  HTTP requests, e.g. when lots of
> requests can be routed to the same container
> * the number of outstand outbound HTTP requests to remote action
> containers (assume all are remote)
> It is unclear the order of magnitude considered for "maxed out slots",
> since container refs should be simple (like ip+port, action metadata,
> activation count, and warm state), inbound connection handling is 
basically
> a http server, and outbound is a connection pool per action container
> (let's presume connection reuse for the moment).
> I think it will certainly need testing to determine these and to be
> configurable in any case, for each of these separate stats.. Is there
> anything else that affects the load for ContainerRouter?
>

"Overload" is determined by the availability of free slots on any container
being able to serve the current action invocation (or rather the absence
thereof). An example:
Say RouterA has 2 containers for action X. Each container has an allowed
concurrency of 10. On each of those 2 there are 10 active invocations
already running (the ContainerRouter knows this, these are open connections
to the containers). If another request comes in for X, we know we don't
have capacity for it. We request more resources and offer the work we got
for stealing.

I don't think there are tweaks needed here. The Router keeps an
"activeInvocations" number per container and compares that to the allowed
concurrency on that container. If activeInvocations == allowedConcurrency
we're out of capacity and need more.

We need a work-stealing queue here to dynamically rebalance between the
Routers since the layer above the Routers has no idea about capacity and
(at least that's my assumption) schedules randomly.

I think it is confusing to say that the ContainerRouter doesn't have capacity 
for it - rather, the existing set of continers in the ContainerRouter don't 
have capacity for it. I understand now, in any case.
So there are a couple of active paths in ContainerRouter, still only 
considering sync/blocking activations:
* warmpath - run immediately
* coldpath - send to queue

And each ContainerRouter has a queue consumer that presumably pulls from the 
queue constantly? Or is consumption based on something else? If all 
ContainerRouters are consuming at the same rate, then while this does 
distribute the load across ContainerRouters, it doesn't really guarantee any 
similar state (number of containers, active connections, etc) at each 
ContainerRouter, I think. Maybe I am missing something here?
  




>
> That request-token will then be taken by any Router that has free
> capacity
> for that action (note: this is not simple with kafka, but might be
> simpler
> with other MQ technologies). Since new resources have been requested,
> it is
> guaranteed that one Router will eventually become free.
>
> Is "requests resources" here requesting new action containers, which it
> won't be able to process itself immediately, but should startup + warm and
> be provided to "any ContainerRouter"? This makes, sense, just want to
> clarify that "resources == containers".
>

Yes, resources == containers.


>
> >
> > If ContainerManager has enough awareness of ContainerRouters'
> states, I'm
> > not sure where using a queue would be used (for redirecting to other
> > ContainerRouters) vs ContainerManager responding with a
> ContainerRouters
> > reference (instead of an action container reference) - I'm not
> following
> > the logic of the edge case in the proposal - there is 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Markus Thömmes
Hi Tyson,

Am Mi., 22. Aug. 2018 um 23:37 Uhr schrieb Tyson Norris
:

> Hi -
> >
> > When exactly is the case that a ContainerRouter should put a blocking
> > activation to a queue for stealing? Since a) it is not spawning
> containers
> > and b) it is not parsing request/response bodies, can we say this
> would
> > only happen when a ContainerRouter maxes out its incoming request
> handling?
> >
>
> That's exactly the idea! The work-stealing queue will only be used if
> the
> Router where to request landed cannot serve the demand right now. For
> example, if it maxed out the slots it has for a certain action (all
> containers are working to their full extent) it requests more
> resources and
> puts the request-token on the work-stealing queue.
>
> So to clarify, ContainerRouter "load" (which can trigger use of queue) is
> mostly (only?) based on:
> * the number of Container references
> * the number of outstanding inbound  HTTP requests, e.g. when lots of
> requests can be routed to the same container
> * the number of outstand outbound HTTP requests to remote action
> containers (assume all are remote)
> It is unclear the order of magnitude considered for "maxed out slots",
> since container refs should be simple (like ip+port, action metadata,
> activation count, and warm state), inbound connection handling is basically
> a http server, and outbound is a connection pool per action container
> (let's presume connection reuse for the moment).
> I think it will certainly need testing to determine these and to be
> configurable in any case, for each of these separate stats.. Is there
> anything else that affects the load for ContainerRouter?
>

"Overload" is determined by the availability of free slots on any container
being able to serve the current action invocation (or rather the absence
thereof). An example:
Say RouterA has 2 containers for action X. Each container has an allowed
concurrency of 10. On each of those 2 there are 10 active invocations
already running (the ContainerRouter knows this, these are open connections
to the containers). If another request comes in for X, we know we don't
have capacity for it. We request more resources and offer the work we got
for stealing.

I don't think there are tweaks needed here. The Router keeps an
"activeInvocations" number per container and compares that to the allowed
concurrency on that container. If activeInvocations == allowedConcurrency
we're out of capacity and need more.

We need a work-stealing queue here to dynamically rebalance between the
Routers since the layer above the Routers has no idea about capacity and
(at least that's my assumption) schedules randomly.


>
> That request-token will then be taken by any Router that has free
> capacity
> for that action (note: this is not simple with kafka, but might be
> simpler
> with other MQ technologies). Since new resources have been requested,
> it is
> guaranteed that one Router will eventually become free.
>
> Is "requests resources" here requesting new action containers, which it
> won't be able to process itself immediately, but should startup + warm and
> be provided to "any ContainerRouter"? This makes, sense, just want to
> clarify that "resources == containers".
>

Yes, resources == containers.


>
> >
> > If ContainerManager has enough awareness of ContainerRouters'
> states, I'm
> > not sure where using a queue would be used (for redirecting to other
> > ContainerRouters) vs ContainerManager responding with a
> ContainerRouters
> > reference (instead of an action container reference) - I'm not
> following
> > the logic of the edge case in the proposal - there is mention of
> "which
> > controller the request needs to go", but maybe this is a typo and
> should
> > say ContainerRouter?
> >
>
> Indeed that's a typo, it should say ContainerRouter.
>
> The ContainerManager only knows which Router has which Container. It
> does
> not know whether the respective Router has capacity on that container
> (the
> capacity metric is very hard to share since it's ever changing).
>
> Hence, in an edge-case where there are less Containers than Routers,
> the
> ContainerManager can hand out references to the Routers it gave
> Containers
> to the Routers that have none. (This is the edge-case described in the
> proposal).
>
> I'm not sure why in this case the ContainerManager does not just create a
> new container, instead of sending to another Router? If there is some
> intended limit on "number of containers for a particular action", that
> would be a reason, but given that the ContainerManager cannot know the
> state of the existing containers, it seems like sending to another Router
> which has the container, but may not be able to use it immediately, may
> cause failures in some cases.
>

The edge-case here is for very slow load. It's minimizing the amount of
Containers 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Tyson Norris
Hi - 
>
> When exactly is the case that a ContainerRouter should put a blocking
> activation to a queue for stealing? Since a) it is not spawning containers
> and b) it is not parsing request/response bodies, can we say this would
> only happen when a ContainerRouter maxes out its incoming request 
handling?
>

That's exactly the idea! The work-stealing queue will only be used if the
Router where to request landed cannot serve the demand right now. For
example, if it maxed out the slots it has for a certain action (all
containers are working to their full extent) it requests more resources and
puts the request-token on the work-stealing queue.

So to clarify, ContainerRouter "load" (which can trigger use of queue) is 
mostly (only?) based on:
* the number of Container references 
* the number of outstanding inbound  HTTP requests, e.g. when lots of requests 
can be routed to the same container
* the number of outstand outbound HTTP requests to remote action containers 
(assume all are remote)
It is unclear the order of magnitude considered for "maxed out slots", since 
container refs should be simple (like ip+port, action metadata, activation 
count, and warm state), inbound connection handling is basically a http server, 
and outbound is a connection pool per action container (let's presume 
connection reuse for the moment).
I think it will certainly need testing to determine these and to be 
configurable in any case, for each of these separate stats.. Is there anything 
else that affects the load for ContainerRouter?

That request-token will then be taken by any Router that has free capacity
for that action (note: this is not simple with kafka, but might be simpler
with other MQ technologies). Since new resources have been requested, it is
guaranteed that one Router will eventually become free.

Is "requests resources" here requesting new action containers, which it won't 
be able to process itself immediately, but should startup + warm and be 
provided to "any ContainerRouter"? This makes, sense, just want to clarify that 
"resources == containers".

>
> If ContainerManager has enough awareness of ContainerRouters' states, I'm
> not sure where using a queue would be used (for redirecting to other
> ContainerRouters) vs ContainerManager responding with a ContainerRouters
> reference (instead of an action container reference) - I'm not following
> the logic of the edge case in the proposal - there is mention of "which
> controller the request needs to go", but maybe this is a typo and should
> say ContainerRouter?
>

Indeed that's a typo, it should say ContainerRouter.

The ContainerManager only knows which Router has which Container. It does
not know whether the respective Router has capacity on that container (the
capacity metric is very hard to share since it's ever changing).

Hence, in an edge-case where there are less Containers than Routers, the
ContainerManager can hand out references to the Routers it gave Containers
to the Routers that have none. (This is the edge-case described in the
proposal).

I'm not sure why in this case the ContainerManager does not just create a new 
container, instead of sending to another Router? If there is some intended 
limit on "number of containers for a particular action", that would be a 
reason, but given that the ContainerManager cannot know the state of the 
existing containers, it seems like sending to another Router which has the 
container, but may not be able to use it immediately, may cause failures in 
some cases. 


The work-stealing queue though is used to rebalance work in case one of the
Routers get overloaded.

Got it.

Thanks
Tyson
 



Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Markus Thömmes
Hi Tyson,

Am Mi., 22. Aug. 2018 um 22:49 Uhr schrieb Tyson Norris
:

> Yes, agreed this makes sense, same as Carlos is saying.
>
> Let's ignore async for now, I think that one is simpler __ - does "A
> blocking request can still be put onto the work-stealing queue" mean that
> it wouldn't always be put on the queue?
>
> If there is existing warm container capacity in the ContainerRouter
> receiving the activation, ideally it would skip the queue - right?
>

Exactly, it should skip the queue whenever possible.


>
> When exactly is the case that a ContainerRouter should put a blocking
> activation to a queue for stealing? Since a) it is not spawning containers
> and b) it is not parsing request/response bodies, can we say this would
> only happen when a ContainerRouter maxes out its incoming request handling?
>

That's exactly the idea! The work-stealing queue will only be used if the
Router where to request landed cannot serve the demand right now. For
example, if it maxed out the slots it has for a certain action (all
containers are working to their full extent) it requests more resources and
puts the request-token on the work-stealing queue.

That request-token will then be taken by any Router that has free capacity
for that action (note: this is not simple with kafka, but might be simpler
with other MQ technologies). Since new resources have been requested, it is
guaranteed that one Router will eventually become free.


>
> If ContainerManager has enough awareness of ContainerRouters' states, I'm
> not sure where using a queue would be used (for redirecting to other
> ContainerRouters) vs ContainerManager responding with a ContainerRouters
> reference (instead of an action container reference) - I'm not following
> the logic of the edge case in the proposal - there is mention of "which
> controller the request needs to go", but maybe this is a typo and should
> say ContainerRouter?
>

Indeed that's a typo, it should say ContainerRouter.

The ContainerManager only knows which Router has which Container. It does
not know whether the respective Router has capacity on that container (the
capacity metric is very hard to share since it's ever changing).

Hence, in an edge-case where there are less Containers than Routers, the
ContainerManager can hand out references to the Routers it gave Containers
to the Routers that have none. (This is the edge-case described in the
proposal).
The work-stealing queue though is used to rebalance work in case one of the
Routers get overloaded.


>
> Thanks
> Tyson
>
> On 8/21/18, 1:16 AM, "Markus Thömmes"  wrote:
>
> Hi Tyson,
>
> if we take the concerns apart as I proposed above, timeouts should only
> ever be triggered after a request is scheduled as you say, that is: As
> soon
> as it's crossing the user-container mark. With the concern separation,
> it
> is plausible that blocking invocations are never buffered anywhere,
> which
> makes a lot of sense, because you cannot persist the open HTTP
> connection
> to the client anyway.
>
> To make the distinction clear: A blocking request can still be put
> onto the
> work-stealing queue to be balanced between different ContainerRouters.
>
> A blocking request though would never be written to a persistent buffer
> that's used to be able to efficiently handle async invocations and
> backpressuring them. That buffer should be entirely separate and could
> possibly be placed outside of the execution system to make the
> distinction
> more explicit. The execution system itself would then only deal with
> request-response style invocations and asynchronous invocations are
> done by
> having a seperate queue and a consumer that creates HTTP requests to
> the
> execution system.
>
> Cheers,
> Markus
>
> Am Mo., 20. Aug. 2018 um 23:30 Uhr schrieb Tyson Norris
> :
>
> > Thanks for summarizing Markus.
> >
> > Yes this is confusing in context of current system, which stores in
> kafka,
> > but not to indefinitely wait, since timeout begins immediately
> > So, I think the problem of buffering/queueing is: when does the
> timeout
> > begin? If not everything is buffered the same, their timeout should
> not
> > begin until processing begins.
> >
> > Maybe it would make sense to:
> > * always buffer (indefinitely) to queue for async, never for sync
> > * timeout for async not started till read from queue - which may be
> > delayed from time of trigger or http request
> > * this should also come with some system monitoring to indicate the
> queue
> > processing is not keeping up with some configurable max delay
> threshold ("I
> > can’t tolerate delays of > 5 minutes", etc)
> > * ContainerRouters can only pull from async queue when
> > * increasing the number of pending activations won’t exceed
> some
> > threshold (prevent excessive load of async on ContainerRouters)
> > 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-22 Thread Tyson Norris
Yes, agreed this makes sense, same as Carlos is saying. 

Let's ignore async for now, I think that one is simpler __ - does "A blocking 
request can still be put onto the work-stealing queue" mean that it wouldn't 
always be put on the queue? 

If there is existing warm container capacity in the ContainerRouter receiving 
the activation, ideally it would skip the queue - right? 

When exactly is the case that a ContainerRouter should put a blocking 
activation to a queue for stealing? Since a) it is not spawning containers and 
b) it is not parsing request/response bodies, can we say this would only happen 
when a ContainerRouter maxes out its incoming request handling? 

If ContainerManager has enough awareness of ContainerRouters' states, I'm not 
sure where using a queue would be used (for redirecting to other 
ContainerRouters) vs ContainerManager responding with a ContainerRouters 
reference (instead of an action container reference) - I'm not following the 
logic of the edge case in the proposal - there is mention of "which controller 
the request needs to go", but maybe this is a typo and should say 
ContainerRouter?

Thanks
Tyson

On 8/21/18, 1:16 AM, "Markus Thömmes"  wrote:

Hi Tyson,

if we take the concerns apart as I proposed above, timeouts should only
ever be triggered after a request is scheduled as you say, that is: As soon
as it's crossing the user-container mark. With the concern separation, it
is plausible that blocking invocations are never buffered anywhere, which
makes a lot of sense, because you cannot persist the open HTTP connection
to the client anyway.

To make the distinction clear: A blocking request can still be put onto the
work-stealing queue to be balanced between different ContainerRouters.

A blocking request though would never be written to a persistent buffer
that's used to be able to efficiently handle async invocations and
backpressuring them. That buffer should be entirely separate and could
possibly be placed outside of the execution system to make the distinction
more explicit. The execution system itself would then only deal with
request-response style invocations and asynchronous invocations are done by
having a seperate queue and a consumer that creates HTTP requests to the
execution system.

Cheers,
Markus

Am Mo., 20. Aug. 2018 um 23:30 Uhr schrieb Tyson Norris
:

> Thanks for summarizing Markus.
>
> Yes this is confusing in context of current system, which stores in kafka,
> but not to indefinitely wait, since timeout begins immediately
> So, I think the problem of buffering/queueing is: when does the timeout
> begin? If not everything is buffered the same, their timeout should not
> begin until processing begins.
>
> Maybe it would make sense to:
> * always buffer (indefinitely) to queue for async, never for sync
> * timeout for async not started till read from queue - which may be
> delayed from time of trigger or http request
> * this should also come with some system monitoring to indicate the queue
> processing is not keeping up with some configurable max delay threshold 
("I
> can’t tolerate delays of > 5 minutes", etc)
> * ContainerRouters can only pull from async queue when
> * increasing the number of pending activations won’t exceed some
> threshold (prevent excessive load of async on ContainerRouters)
> * ContainerManager is not overloaded (can still create containers,
> or has some configurable way to indicate the cluster is healthy enough to
> cope with extra processing)
>
> We could of course make this configurable so that operators can choose to:
> * treat async/sync activations the same for sync/async (the overloaded
> system fails when either ContainerManager or ContainerRouters are max
> capacity)
> * treat async/sync with preference for:
> * sync - where async is buffered for unknown period before
> processing, incoming sync traffic (or lack of)
> * async - where sync is sent to the queue, to be processed in
> order of receipt interleaved with async traffic (similar to today, I 
think)
>
> I think the impact here (aside from technical) is the timing difference if
> we introduce latency in side affects based on the activation being sync vs
> async.
>
> I’m also not sure prioritizing message processing between sync/async
> internally in ContainerRouter is better than just have some dedicated
> ContainerRouters that receive all async activations, and others that
> receive all sync activations, but the end result is the same, I think.
>
>
> > On Aug 19, 2018, at 4:29 AM, Markus Thömmes 
> wrote:
> >
> > Hi Tyson, Carlos,
> >
> > FWIW I should change that to no longer say "Kafka" but "buffer" or
> "message
> 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-21 Thread Markus Thömmes
Hi Tyson,

if we take the concerns apart as I proposed above, timeouts should only
ever be triggered after a request is scheduled as you say, that is: As soon
as it's crossing the user-container mark. With the concern separation, it
is plausible that blocking invocations are never buffered anywhere, which
makes a lot of sense, because you cannot persist the open HTTP connection
to the client anyway.

To make the distinction clear: A blocking request can still be put onto the
work-stealing queue to be balanced between different ContainerRouters.

A blocking request though would never be written to a persistent buffer
that's used to be able to efficiently handle async invocations and
backpressuring them. That buffer should be entirely separate and could
possibly be placed outside of the execution system to make the distinction
more explicit. The execution system itself would then only deal with
request-response style invocations and asynchronous invocations are done by
having a seperate queue and a consumer that creates HTTP requests to the
execution system.

Cheers,
Markus

Am Mo., 20. Aug. 2018 um 23:30 Uhr schrieb Tyson Norris
:

> Thanks for summarizing Markus.
>
> Yes this is confusing in context of current system, which stores in kafka,
> but not to indefinitely wait, since timeout begins immediately
> So, I think the problem of buffering/queueing is: when does the timeout
> begin? If not everything is buffered the same, their timeout should not
> begin until processing begins.
>
> Maybe it would make sense to:
> * always buffer (indefinitely) to queue for async, never for sync
> * timeout for async not started till read from queue - which may be
> delayed from time of trigger or http request
> * this should also come with some system monitoring to indicate the queue
> processing is not keeping up with some configurable max delay threshold ("I
> can’t tolerate delays of > 5 minutes", etc)
> * ContainerRouters can only pull from async queue when
> * increasing the number of pending activations won’t exceed some
> threshold (prevent excessive load of async on ContainerRouters)
> * ContainerManager is not overloaded (can still create containers,
> or has some configurable way to indicate the cluster is healthy enough to
> cope with extra processing)
>
> We could of course make this configurable so that operators can choose to:
> * treat async/sync activations the same for sync/async (the overloaded
> system fails when either ContainerManager or ContainerRouters are max
> capacity)
> * treat async/sync with preference for:
> * sync - where async is buffered for unknown period before
> processing, incoming sync traffic (or lack of)
> * async - where sync is sent to the queue, to be processed in
> order of receipt interleaved with async traffic (similar to today, I think)
>
> I think the impact here (aside from technical) is the timing difference if
> we introduce latency in side affects based on the activation being sync vs
> async.
>
> I’m also not sure prioritizing message processing between sync/async
> internally in ContainerRouter is better than just have some dedicated
> ContainerRouters that receive all async activations, and others that
> receive all sync activations, but the end result is the same, I think.
>
>
> > On Aug 19, 2018, at 4:29 AM, Markus Thömmes 
> wrote:
> >
> > Hi Tyson, Carlos,
> >
> > FWIW I should change that to no longer say "Kafka" but "buffer" or
> "message
> > queue".
> >
> > I see two use-cases for a queue here:
> > 1. What you two are alluding to: Buffering asynchronous requests because
> of
> > a different notion of "latency sensitivity" if the system is in an
> overload
> > scenario.
> > 2. As a work-stealing type balancing layer between the ContainerRouters.
> If
> > we assume round-robin/least-connected (essentially random) scheduling
> > between ContainerRouters, we will get load discrepancies between them. To
> > smoothen those out, a ContainerRouter can put the work on a queue to be
> > stolen by a Router that actually has space for that work (for example:
> > Router1 requests a new container, puts the work on the queue while it
> waits
> > for that container, Router2 already has a free container and executes the
> > action by stealing it from the queue). This does has the added complexity
> > of breaking a streaming communication between User and Container (to
> > support essentially unbounded payloads). A nasty wrinkle that might
> render
> > this design alternative invalid! We could come up with something smarter
> > here, i.e. only putting a reference to the work on the queue and the
> > stealer connects to the initial owner directly which then streams the
> > payload through to the stealer, rather than persisting it somewhere.
> >
> > It is important to note, that in this design, blocking invokes could
> > potentially gain the ability to have unbounded entities, where
> > trigger/non-blocking invokes might need to be subject to a 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-20 Thread Tyson Norris
Thanks for summarizing Markus. 

Yes this is confusing in context of current system, which stores in kafka, but 
not to indefinitely wait, since timeout begins immediately
So, I think the problem of buffering/queueing is: when does the timeout begin? 
If not everything is buffered the same, their timeout should not begin until 
processing begins. 

Maybe it would make sense to:
* always buffer (indefinitely) to queue for async, never for sync 
* timeout for async not started till read from queue - which may be delayed 
from time of trigger or http request
* this should also come with some system monitoring to indicate the queue 
processing is not keeping up with some configurable max delay threshold ("I 
can’t tolerate delays of > 5 minutes", etc)
* ContainerRouters can only pull from async queue when
* increasing the number of pending activations won’t exceed some 
threshold (prevent excessive load of async on ContainerRouters) 
* ContainerManager is not overloaded (can still create containers, or 
has some configurable way to indicate the cluster is healthy enough to cope 
with extra processing)

We could of course make this configurable so that operators can choose to:
* treat async/sync activations the same for sync/async (the overloaded system 
fails when either ContainerManager or ContainerRouters are max capacity)
* treat async/sync with preference for:
* sync - where async is buffered for unknown period before processing, 
incoming sync traffic (or lack of)
* async - where sync is sent to the queue, to be processed in order of 
receipt interleaved with async traffic (similar to today, I think)

I think the impact here (aside from technical) is the timing difference if we 
introduce latency in side affects based on the activation being sync vs async.

I’m also not sure prioritizing message processing between sync/async internally 
in ContainerRouter is better than just have some dedicated ContainerRouters 
that receive all async activations, and others that receive all sync 
activations, but the end result is the same, I think.


> On Aug 19, 2018, at 4:29 AM, Markus Thömmes  wrote:
> 
> Hi Tyson, Carlos,
> 
> FWIW I should change that to no longer say "Kafka" but "buffer" or "message
> queue".
> 
> I see two use-cases for a queue here:
> 1. What you two are alluding to: Buffering asynchronous requests because of
> a different notion of "latency sensitivity" if the system is in an overload
> scenario.
> 2. As a work-stealing type balancing layer between the ContainerRouters. If
> we assume round-robin/least-connected (essentially random) scheduling
> between ContainerRouters, we will get load discrepancies between them. To
> smoothen those out, a ContainerRouter can put the work on a queue to be
> stolen by a Router that actually has space for that work (for example:
> Router1 requests a new container, puts the work on the queue while it waits
> for that container, Router2 already has a free container and executes the
> action by stealing it from the queue). This does has the added complexity
> of breaking a streaming communication between User and Container (to
> support essentially unbounded payloads). A nasty wrinkle that might render
> this design alternative invalid! We could come up with something smarter
> here, i.e. only putting a reference to the work on the queue and the
> stealer connects to the initial owner directly which then streams the
> payload through to the stealer, rather than persisting it somewhere.
> 
> It is important to note, that in this design, blocking invokes could
> potentially gain the ability to have unbounded entities, where
> trigger/non-blocking invokes might need to be subject to a bound here to be
> able to support eventual execution efficiently.
> 
> Personally, I'm much more torn to the work-stealing type case. It implies a
> wholy different notion of using the queue though and doesn't have much to
> do with the way we use it today, which might be confusing. It could also
> well be the case, that work-stealing type algorithms are easier to back on
> a proper MQ vs. trying to make it work on Kafka.
> 
> It might also be important to note that those two use-cases might require
> different technologies (buffering vs. queue-backend for work-stealing) and
> could well be seperated in the design as well. For instance, buffering
> triggers fires etc. does not necessarily need to be done on the execution
> layer but could instead be pushed to another layer. Having the notion of
> "async" vs "sync" in the execution layer could be benefitial for
> loadbalancing itself though. Something worth exploring imho.
> 
> Sorry for the wall of text, I hope this clarifies things!
> 
> Cheers,
> Markus
> 
> Am Sa., 18. Aug. 2018 um 02:36 Uhr schrieb Carlos Santana <
> csantan...@gmail.com>:
> 
>> triggers get responded right away (202) with an activation is and then
>> sent to the queue to be processed async same as async action invokes.
>> 
>> I 

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-20 Thread Markus Thömmes
I believe we should keep this as general as long as possible. We should
define the characteristics we need for each path rather than deciding on a
certain technology early on.

Am So., 19. Aug. 2018 um 16:07 Uhr schrieb Dascalita Dragos <
ddrag...@gmail.com>:

> “... FWIW I should change that to no longer say "Kafka" but "buffer" or
> "message
> queue"...”
> +1. One idea could be to use Akka Streams and let the OW operator make a
> decision on using Kafka with Akka Streams, or not [1]. This would make OW
> deployment easier, Kafka becoming optional, while opening the door for
> other connectors like AWS Kinesis, Azure Event Hub, and others (see the
> link at [1] for a more complete list of connectors )
>
> [1] - https://developer.lightbend.com/docs/alpakka/current/
> On Sun, Aug 19, 2018 at 7:30 AM Markus Thömmes 
> wrote:
>
> > Hi Tyson, Carlos,
> >
> > FWIW I should change that to no longer say "Kafka" but "buffer" or
> "message
> > queue".
> >
> > I see two use-cases for a queue here:
> > 1. What you two are alluding to: Buffering asynchronous requests because
> of
> > a different notion of "latency sensitivity" if the system is in an
> overload
> > scenario.
> > 2. As a work-stealing type balancing layer between the ContainerRouters.
> If
> > we assume round-robin/least-connected (essentially random) scheduling
> > between ContainerRouters, we will get load discrepancies between them. To
> > smoothen those out, a ContainerRouter can put the work on a queue to be
> > stolen by a Router that actually has space for that work (for example:
> > Router1 requests a new container, puts the work on the queue while it
> waits
> > for that container, Router2 already has a free container and executes the
> > action by stealing it from the queue). This does has the added complexity
> > of breaking a streaming communication between User and Container (to
> > support essentially unbounded payloads). A nasty wrinkle that might
> render
> > this design alternative invalid! We could come up with something smarter
> > here, i.e. only putting a reference to the work on the queue and the
> > stealer connects to the initial owner directly which then streams the
> > payload through to the stealer, rather than persisting it somewhere.
> >
> > It is important to note, that in this design, blocking invokes could
> > potentially gain the ability to have unbounded entities, where
> > trigger/non-blocking invokes might need to be subject to a bound here to
> be
> > able to support eventual execution efficiently.
> >
> > Personally, I'm much more torn to the work-stealing type case. It
> implies a
> > wholy different notion of using the queue though and doesn't have much to
> > do with the way we use it today, which might be confusing. It could also
> > well be the case, that work-stealing type algorithms are easier to back
> on
> > a proper MQ vs. trying to make it work on Kafka.
> >
> > It might also be important to note that those two use-cases might require
> > different technologies (buffering vs. queue-backend for work-stealing)
> and
> > could well be seperated in the design as well. For instance, buffering
> > triggers fires etc. does not necessarily need to be done on the execution
> > layer but could instead be pushed to another layer. Having the notion of
> > "async" vs "sync" in the execution layer could be benefitial for
> > loadbalancing itself though. Something worth exploring imho.
> >
> > Sorry for the wall of text, I hope this clarifies things!
> >
> > Cheers,
> > Markus
> >
> > Am Sa., 18. Aug. 2018 um 02:36 Uhr schrieb Carlos Santana <
> > csantan...@gmail.com>:
> >
> > > triggers get responded right away (202) with an activation is and then
> > > sent to the queue to be processed async same as async action invokes.
> > >
> > > I think we would keep same contract as today for this type of
> activations
> > > that are eventually process different from blocking invokes including
> we
> > > Actions were the http client hold a connection waiting for the result
> > back.
> > >
> > > - Carlos Santana
> > > @csantanapr
> > >
> > > > On Aug 17, 2018, at 6:14 PM, Tyson Norris  >
> > > wrote:
> > > >
> > > > Hi -
> > > > Separate thread regarding the proposal: what is considered for
> routing
> > > activations as overload and destined for kafka?
> > > >
> > > > In general, if kafka is not on the blocking activation path, why
> would
> > > it be used at all, if the timeouts and processing expectations of
> > blocking
> > > and non-blocking are the same?
> > > >
> > > > One case I can imagine: triggers + non-blocking invokes, but only in
> > the
> > > case where those have some different timeout characteristics. e.g. if a
> > > trigger fires an action, is there any case where the activation should
> be
> > > buffered to kafka if it will timeout same as a blocking activation?
> > > >
> > > > Sorry if I’m missing something obvious.
> > > >
> > > > Thanks
> > > > Tyson
> > > >
> > > >
> > >
> >
>


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-19 Thread Dascalita Dragos
“... FWIW I should change that to no longer say "Kafka" but "buffer" or
"message
queue"...”
+1. One idea could be to use Akka Streams and let the OW operator make a
decision on using Kafka with Akka Streams, or not [1]. This would make OW
deployment easier, Kafka becoming optional, while opening the door for
other connectors like AWS Kinesis, Azure Event Hub, and others (see the
link at [1] for a more complete list of connectors )

[1] - https://developer.lightbend.com/docs/alpakka/current/
On Sun, Aug 19, 2018 at 7:30 AM Markus Thömmes 
wrote:

> Hi Tyson, Carlos,
>
> FWIW I should change that to no longer say "Kafka" but "buffer" or "message
> queue".
>
> I see two use-cases for a queue here:
> 1. What you two are alluding to: Buffering asynchronous requests because of
> a different notion of "latency sensitivity" if the system is in an overload
> scenario.
> 2. As a work-stealing type balancing layer between the ContainerRouters. If
> we assume round-robin/least-connected (essentially random) scheduling
> between ContainerRouters, we will get load discrepancies between them. To
> smoothen those out, a ContainerRouter can put the work on a queue to be
> stolen by a Router that actually has space for that work (for example:
> Router1 requests a new container, puts the work on the queue while it waits
> for that container, Router2 already has a free container and executes the
> action by stealing it from the queue). This does has the added complexity
> of breaking a streaming communication between User and Container (to
> support essentially unbounded payloads). A nasty wrinkle that might render
> this design alternative invalid! We could come up with something smarter
> here, i.e. only putting a reference to the work on the queue and the
> stealer connects to the initial owner directly which then streams the
> payload through to the stealer, rather than persisting it somewhere.
>
> It is important to note, that in this design, blocking invokes could
> potentially gain the ability to have unbounded entities, where
> trigger/non-blocking invokes might need to be subject to a bound here to be
> able to support eventual execution efficiently.
>
> Personally, I'm much more torn to the work-stealing type case. It implies a
> wholy different notion of using the queue though and doesn't have much to
> do with the way we use it today, which might be confusing. It could also
> well be the case, that work-stealing type algorithms are easier to back on
> a proper MQ vs. trying to make it work on Kafka.
>
> It might also be important to note that those two use-cases might require
> different technologies (buffering vs. queue-backend for work-stealing) and
> could well be seperated in the design as well. For instance, buffering
> triggers fires etc. does not necessarily need to be done on the execution
> layer but could instead be pushed to another layer. Having the notion of
> "async" vs "sync" in the execution layer could be benefitial for
> loadbalancing itself though. Something worth exploring imho.
>
> Sorry for the wall of text, I hope this clarifies things!
>
> Cheers,
> Markus
>
> Am Sa., 18. Aug. 2018 um 02:36 Uhr schrieb Carlos Santana <
> csantan...@gmail.com>:
>
> > triggers get responded right away (202) with an activation is and then
> > sent to the queue to be processed async same as async action invokes.
> >
> > I think we would keep same contract as today for this type of activations
> > that are eventually process different from blocking invokes including we
> > Actions were the http client hold a connection waiting for the result
> back.
> >
> > - Carlos Santana
> > @csantanapr
> >
> > > On Aug 17, 2018, at 6:14 PM, Tyson Norris 
> > wrote:
> > >
> > > Hi -
> > > Separate thread regarding the proposal: what is considered for routing
> > activations as overload and destined for kafka?
> > >
> > > In general, if kafka is not on the blocking activation path, why would
> > it be used at all, if the timeouts and processing expectations of
> blocking
> > and non-blocking are the same?
> > >
> > > One case I can imagine: triggers + non-blocking invokes, but only in
> the
> > case where those have some different timeout characteristics. e.g. if a
> > trigger fires an action, is there any case where the activation should be
> > buffered to kafka if it will timeout same as a blocking activation?
> > >
> > > Sorry if I’m missing something obvious.
> > >
> > > Thanks
> > > Tyson
> > >
> > >
> >
>


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-19 Thread Markus Thömmes
Hi Tyson, Carlos,

FWIW I should change that to no longer say "Kafka" but "buffer" or "message
queue".

I see two use-cases for a queue here:
1. What you two are alluding to: Buffering asynchronous requests because of
a different notion of "latency sensitivity" if the system is in an overload
scenario.
2. As a work-stealing type balancing layer between the ContainerRouters. If
we assume round-robin/least-connected (essentially random) scheduling
between ContainerRouters, we will get load discrepancies between them. To
smoothen those out, a ContainerRouter can put the work on a queue to be
stolen by a Router that actually has space for that work (for example:
Router1 requests a new container, puts the work on the queue while it waits
for that container, Router2 already has a free container and executes the
action by stealing it from the queue). This does has the added complexity
of breaking a streaming communication between User and Container (to
support essentially unbounded payloads). A nasty wrinkle that might render
this design alternative invalid! We could come up with something smarter
here, i.e. only putting a reference to the work on the queue and the
stealer connects to the initial owner directly which then streams the
payload through to the stealer, rather than persisting it somewhere.

It is important to note, that in this design, blocking invokes could
potentially gain the ability to have unbounded entities, where
trigger/non-blocking invokes might need to be subject to a bound here to be
able to support eventual execution efficiently.

Personally, I'm much more torn to the work-stealing type case. It implies a
wholy different notion of using the queue though and doesn't have much to
do with the way we use it today, which might be confusing. It could also
well be the case, that work-stealing type algorithms are easier to back on
a proper MQ vs. trying to make it work on Kafka.

It might also be important to note that those two use-cases might require
different technologies (buffering vs. queue-backend for work-stealing) and
could well be seperated in the design as well. For instance, buffering
triggers fires etc. does not necessarily need to be done on the execution
layer but could instead be pushed to another layer. Having the notion of
"async" vs "sync" in the execution layer could be benefitial for
loadbalancing itself though. Something worth exploring imho.

Sorry for the wall of text, I hope this clarifies things!

Cheers,
Markus

Am Sa., 18. Aug. 2018 um 02:36 Uhr schrieb Carlos Santana <
csantan...@gmail.com>:

> triggers get responded right away (202) with an activation is and then
> sent to the queue to be processed async same as async action invokes.
>
> I think we would keep same contract as today for this type of activations
> that are eventually process different from blocking invokes including we
> Actions were the http client hold a connection waiting for the result back.
>
> - Carlos Santana
> @csantanapr
>
> > On Aug 17, 2018, at 6:14 PM, Tyson Norris 
> wrote:
> >
> > Hi -
> > Separate thread regarding the proposal: what is considered for routing
> activations as overload and destined for kafka?
> >
> > In general, if kafka is not on the blocking activation path, why would
> it be used at all, if the timeouts and processing expectations of blocking
> and non-blocking are the same?
> >
> > One case I can imagine: triggers + non-blocking invokes, but only in the
> case where those have some different timeout characteristics. e.g. if a
> trigger fires an action, is there any case where the activation should be
> buffered to kafka if it will timeout same as a blocking activation?
> >
> > Sorry if I’m missing something obvious.
> >
> > Thanks
> > Tyson
> >
> >
>


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-17 Thread Carlos Santana
triggers get responded right away (202) with an activation is and then sent to 
the queue to be processed async same as async action invokes. 

I think we would keep same contract as today for this type of activations that 
are eventually process different from blocking invokes including we Actions 
were the http client hold a connection waiting for the result back. 

- Carlos Santana
@csantanapr

> On Aug 17, 2018, at 6:14 PM, Tyson Norris  wrote:
> 
> Hi - 
> Separate thread regarding the proposal: what is considered for routing 
> activations as overload and destined for kafka?
> 
> In general, if kafka is not on the blocking activation path, why would it be 
> used at all, if the timeouts and processing expectations of blocking and 
> non-blocking are the same?
> 
> One case I can imagine: triggers + non-blocking invokes, but only in the case 
> where those have some different timeout characteristics. e.g. if a trigger 
> fires an action, is there any case where the activation should be buffered to 
> kafka if it will timeout same as a blocking activation?  
> 
> Sorry if I’m missing something obvious.
> 
> Thanks
> Tyson
> 
>