Re: Prototyping for a future architecture

2018-08-29 Thread TzuChiao Yeh
Hi Markus,

Thanks! Yes, the design is based on you've put out a few weeks ago. The
biggest difference is there's no ContainerRouter, instead, implement in
LoadBalancer impl. The terminology I've changed for shorter Scala-based
project naming convention, but basically follow up your idea. On the
detailed protocols and design, I've tried my own way, and because there's a
deadline in gsoc schedule, that I choose the way I'm more comfortable and
can only afford subset of overall design. i.e. use akka message passing
instead of http, no shared queues, no kube, reuse invoker-agent for
pause/unpause, no error path, no log and may not race-free, etc. I don't
expect to reuse anything in this implementation, hope this won't lead to
misunderstanding around folks.

Thanks,
Tzu-Chiao

On Wed, Aug 29, 2018 at 5:00 PM Markus Thömmes 
wrote:

> Hi Tzu-Chiao,
>
> first of all: This looks amazing! As the terminology is similar: Is this a
> separate design or is this based on what I've put out a few weeks ago?
>
> In general: Everybody is eligible to join the prototyping effort, I think
> that's the whole point of having that discussion here. You're very welcome
> to join there and it seems you'll even be more knowledgeable on some of the
> things than anybody else, so it'd be awesome to get your experience in
> there!
>
> Cheers,
> Markus
>
> Am Mi., 29. Aug. 2018 um 10:48 Uhr schrieb TzuChiao Yeh <
> su3g4284zo...@gmail.com>:
>
> > Hi,
> >
> > I'm wondering if someone who pay interests aside from core teams (like
> me)
> > are eligible to join prototyping? I know it will increase the maintenance
> > effort on managing all things being visible: opening and labeling issues,
> > plans, assignees, etc. If yes, from my point of view, a standalone repo
> > with issue tracker and pull requests system might be easier to achieve
> the
> > visibility and communication. Will it be possible to use submodule if
> > there's a code-reuse or clarifying migration path concerned?
> >
> > For prototyping:
> > I've drafted a small prototype [1] during last months. Bare me to
> introduce
> > some background about me: some folks might know that, I'm a master
> student
> > and have attended google summer of code this year with openwhisk,
> mentored
> > by Rodric and Carlos. The prototype is the submitted result for final
> work
> > during last few months, although it's buggy, ill-designed and
> not-finished
> > yet. However, if it's ready to start prototyping now, bare me to showoff
> > this under a poor state, as the purpose of my past work, hope this help
> to
> > explore ideas before going to deeper and collaboration. (I've asked most
> of
> > issues I've found in previous threads and may be solved in mind between
> > folks however.)
> >
> > [1] https://tz70s.github.io/posts/openwhisk-performance-improvement/
> >
> > Thanks,
> > Tzu-Chiao
> >
> > On Wed, Aug 29, 2018 at 11:31 AM Markus Thömmes <
> markusthoem...@apache.org
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > okay, let's separate concerns:
> > >
> > > ### Golang vs. Scala vs. something else entirely: ###
> > >
> > > I'm okay doing it in Scala although I don't see an issue in swapping
> the
> > > language for a component that needs to be rewritten from the ground up
> > > anyway (which I'm suggesting at least the ContainerRouter is).
> > >
> > > The ContainerPool/ContainerProxy part of it comes to mind immediately.
> It
> > > was built with a scale of maybe 100 containers max in mind. There are a
> > few
> > > performance problems with it at scale, that I can point out off the
> bat.
> > > The changes needed to mitigate those are rather severe and while we
> > **can**
> > > implement those (and maybe even should if this proofs to make sense) I
> > > believe it makes sense to at least experiment with a freshly written
> > > component. In the new architecture, we can also greatly reduce the
> state
> > > model to just paused/unpaused for the ContainerRouter, further
> > simplifying
> > > its implementation.
> > >
> > > The ContainerRouter's speed requirements are more close to those of
> > > nginx/envoy than they are to those of the current ContainerPool/Proxy
> > > implementation. As Dave mentioned, for the ContainerRouter it will be
> > very
> > > critical to know how many containers it can handle in reality, hence
> I'm
> > > shooting for an as efficient implementation of its innerts as possible
> to
> > > flesh out how far we can pus

Re: Prototyping for a future architecture

2018-08-29 Thread TzuChiao Yeh
Hi,

I'm wondering if someone who pay interests aside from core teams (like me)
are eligible to join prototyping? I know it will increase the maintenance
effort on managing all things being visible: opening and labeling issues,
plans, assignees, etc. If yes, from my point of view, a standalone repo
with issue tracker and pull requests system might be easier to achieve the
visibility and communication. Will it be possible to use submodule if
there's a code-reuse or clarifying migration path concerned?

For prototyping:
I've drafted a small prototype [1] during last months. Bare me to introduce
some background about me: some folks might know that, I'm a master student
and have attended google summer of code this year with openwhisk, mentored
by Rodric and Carlos. The prototype is the submitted result for final work
during last few months, although it's buggy, ill-designed and not-finished
yet. However, if it's ready to start prototyping now, bare me to showoff
this under a poor state, as the purpose of my past work, hope this help to
explore ideas before going to deeper and collaboration. (I've asked most of
issues I've found in previous threads and may be solved in mind between
folks however.)

[1] https://tz70s.github.io/posts/openwhisk-performance-improvement/

Thanks,
Tzu-Chiao

On Wed, Aug 29, 2018 at 11:31 AM Markus Thömmes 
wrote:

> Hi,
>
> okay, let's separate concerns:
>
> ### Golang vs. Scala vs. something else entirely: ###
>
> I'm okay doing it in Scala although I don't see an issue in swapping the
> language for a component that needs to be rewritten from the ground up
> anyway (which I'm suggesting at least the ContainerRouter is).
>
> The ContainerPool/ContainerProxy part of it comes to mind immediately. It
> was built with a scale of maybe 100 containers max in mind. There are a few
> performance problems with it at scale, that I can point out off the bat.
> The changes needed to mitigate those are rather severe and while we **can**
> implement those (and maybe even should if this proofs to make sense) I
> believe it makes sense to at least experiment with a freshly written
> component. In the new architecture, we can also greatly reduce the state
> model to just paused/unpaused for the ContainerRouter, further simplifying
> its implementation.
>
> The ContainerRouter's speed requirements are more close to those of
> nginx/envoy than they are to those of the current ContainerPool/Proxy
> implementation. As Dave mentioned, for the ContainerRouter it will be very
> critical to know how many containers it can handle in reality, hence I'm
> shooting for an as efficient implementation of its innerts as possible to
> flesh out how far we can push the boundaries here.
>
> In any case: Yes we can stick to Scala here, but at least the
> ContainerRouter might be 100% new code and we might not even be able to
> rely on things like akka-http for its implementation for the reasons
> mentioned above.
>
> ### Seperate repo vs. branch ###
>
> Towards Michaels points: The prototype repo is supposed to be used for
> experimenting/prototyping and for that purpose only. As Dave pointed out
> above, we'd want these experiments to happen with Apache visibility. There
> will never be a completely working system available in that repository. The
> goal of it is to find out if certain assumptions of a new design are
> feasible or not. The goal is **not** to build productised components that
> we can take over to the mainline.
>
> I always thought of that rearch as an incremental step as well and it
> shouldn't need major changes to the API (or at least none that we are not
> already introducing right now). I don't think we're going to see the true
> potential of this though if we try to take the system's components as is
> and reuse them in contexts they have never been built for in the first
> place.
>
> As soon as we have clearance over which way we want to ultimately go with
> the architecture, I agree we should come up with a nice migration plan and
> start building it into the main repository through that migration plan. The
> prototyping repository should vanish at some point.
>
> ### Summary ###
>
> I don't feel super strong about a choice of language, I just thought I'd
> throw that in the ring as well. As sentiments seem quite strong against
> doing that right now, which is perfectly fine, I'd be fine doing the
> implementations in Scala. The disclaimer though is, that I'd like to write
> critical components afresh to not have to fight with existing architectural
> choices to much but to be able to think freely about the problems at hand.
>
> Which is why I do feel quite strongly about not doing this in a branch. The
> structure of the project will be very different and I can easily see people
> confused if we start having pull-requests against a "prototype" branch
> mixed in with our usual pull-requests.
>
> I do however agree that we eventually will need to build this in an
> incremental way, once we 

Re: Proposal on a future architecture of OpenWhisk

2018-08-20 Thread TzuChiao Yeh
Yes, exactly.

Sorry if my poor English bothering you :(, I'll try my best to correct
texts. I don't have an accurate model in mind, just share some thoughts I
think that might be helpful:

As you say before, there's some pre-conditions for scheduling decision:
unbounded/bounded system, fair/unfair scheduling etc.

For an ubounded system, providers may not that care about the problem on
over-estimation; on the contrary, the resource bounded system cares about
the "overall throughput" and bounded resource utilization is stable, and
potentially cause fair scheduling decisions: "paying penalties as you go
more". Therefore, the following mechanism is based on the "assumption of
using bounded system".

I'd read a scholar paper quite relevant to this, but forget some detail on
it and will re-read it after. The basic idea is splitting queues into a
warm queues and a cold-start queues, and add delay (penalty) on pulling
cold-start queue. In the context of OW:

1. ContainerRouters duplicate and queue activation (reference) into warm
and cold-start queue.
2. ContainerRouter pull out activation (reference) from warm queue if a
container is available again, and drop the activation (reference) from
cold-start queue.
3. ContainerManager pull out activation (reference) as creation request
from cold-start queue with an "incremental delay".
4. Continue (3), ContainerManager doesn't pull activation (reference) from
warm-start queue. Once the activation (reference) being stolen out via
ContainerRouter during creation. There's a "over-estimate" occurred.

I believe scheduling in serverless model should care about how much
queueing events associated with available resource slots and even how many
slots we already allocated for a specific action, namespace, etc in bounded
system.  The critical point is how do we set the incremental delay, but I
think ContainerManager potentially has enough information to do a smarter
decision between these metrics. In addition, since this is not a critical
path, we can afford a slightly gained latency here for better system
throughput.

I.e. an intuitive approach: *NextPollDelay = DelayFactor *
IncrementalFactor * CurrentAllocatedSlotsRatio * NumOfOverEstimate  /
QueuedEvents*

And we can make user configure delay factor, i.e. 0 for 0 poll delay in
system doesn't really care about this (that we can have a unified model for
either bounded or unbounded system) or customized how much penalty would
like to pay if a burst occurred.

This is quite straightforward and may have plenty of problems I think, i.e.
1. serverless workload has uncertain elapsed time, 2. latency in OW system
needs to acquiring information and making decision, 3. message queue
operation latency, 4. gains more complex if join pre-warmed model and
priority into scheduling. 5. will this break serverless pricing model? 6.
...

I think there's no significant change on the big picture of future
architecture and should not stop us going forward from now. If folks pay
more interests on the problem of over-estimation, we can further find out a
proper solution after having more detail on the future architecture. Since
throttling already help us to avoid from this situation.

Thanks!

On Mon, Aug 20, 2018 at 4:03 PM Markus Thömmes 
wrote:

> Am So., 19. Aug. 2018 um 18:59 Uhr schrieb TzuChiao Yeh <
> su3g4284zo...@gmail.com>:
>
> > On Sun, Aug 19, 2018 at 7:13 PM Markus Thömmes <
> markusthoem...@apache.org>
> > wrote:
> >
> > > Hi Tzu-Chiao,
> > >
> > > Am Sa., 18. Aug. 2018 um 06:56 Uhr schrieb TzuChiao Yeh <
> > > su3g4284zo...@gmail.com>:
> > >
> > > > Hi Markus,
> > > >
> > > > Nice thoughts on separating logics in this revision! I'm not sure
> this
> > > > question has already been clarified, sorry if duplicate.
> > > >
> > > > Same question on cluster singleton:
> > > >
> > > > I think there will be two possibilities on container deletion: 1.
> > > > ContainerRouter removes it (when error or idle-state) 2.
> > ContainerManager
> > > > decides to remove it (i.e. clear space for new creation).
> > > >
> > > > For case 2, how do we ensure the safe deletion in ContainerManager?
> > > > Consider if there's still a similar model on busy/free/prewarmed
> pool,
> > it
> > > > might require additional states related to containers from busy to
> free
> > > > state, then we can safely remove it or reject if nothing found
> (system
> > > > overloaded).
> > > >
> > > > By paused state or other states/message? There might be some
> trade-offs
> > > on
> > > > granularity (time-slice in scheduling) and performance bottleneck on
> > &g

Re: Proposal on a future architecture of OpenWhisk

2018-08-19 Thread TzuChiao Yeh
On Sun, Aug 19, 2018 at 7:13 PM Markus Thömmes 
wrote:

> Hi Tzu-Chiao,
>
> Am Sa., 18. Aug. 2018 um 06:56 Uhr schrieb TzuChiao Yeh <
> su3g4284zo...@gmail.com>:
>
> > Hi Markus,
> >
> > Nice thoughts on separating logics in this revision! I'm not sure this
> > question has already been clarified, sorry if duplicate.
> >
> > Same question on cluster singleton:
> >
> > I think there will be two possibilities on container deletion: 1.
> > ContainerRouter removes it (when error or idle-state) 2. ContainerManager
> > decides to remove it (i.e. clear space for new creation).
> >
> > For case 2, how do we ensure the safe deletion in ContainerManager?
> > Consider if there's still a similar model on busy/free/prewarmed pool, it
> > might require additional states related to containers from busy to free
> > state, then we can safely remove it or reject if nothing found (system
> > overloaded).
> >
> > By paused state or other states/message? There might be some trade-offs
> on
> > granularity (time-slice in scheduling) and performance bottleneck on
> > ClusterSingleton.
> >

I'm not sure if I quite got the point, but here's an attempt on an
> explanation:
>
> Yes, Container removal in case 2 is triggered from the ContainerManager. To
> be able to safely remove it, it requests all ContainerRouters owning that
> container to stop serving it and hand it back. Once it's been handed back,
> the ContainerManager can safely delete it. The contract should also say: A
> container must be handed back in unpaused state, so it can be deleted
> safely. Since the ContainerRouters handle pause/unpause, they'll need to
> stop serving the container, unpause it, remove it from their state and
> acknowledge to the ContainerManager that they handed it back.
>

Thank you, it's clear to me.


> There is an open question on when to consider a system to be in overflow
> state, or rather: How to handle the edge-situation. If you cannot generate
> more containers, we need to decide whether we remove another container (the
> case you're describing) or if we call it quits and say "503, overloaded, go
> away for now". The logic deciding this is up for discussion as well. The
> heuristic could take into account how many resources in the whole system
> you already own, how many resources do others own and if we want to decide
> to share those fairly or not-fairly. Note that this is also very much
> related to being able to scale the resources up in themselves (to be able
> to generate new containers). If we assume a bounded system though, yes,
> we'll need to find a strategy on how to handle this case. I believe with
> the state the ContainerManager has, it can provide a more eloquent answer
> to that question than what we can do today (nothing really, we just keep on
> churning through containers).
>

I agree. An additional problem is in the case of burst requests,
ContainerManager will "over-estimate" containers allocation, whether
work-stealing between ContainerRouters has been enabled or not. For bounded
system, we have better carefully handle these to avoid frequently
creation/deletion. I'm wondering if sharing message queue between
ContainerManager (since it's not a critical path) or any mechanism for
checking queue size (i.e. checking kafka lags) can possibly eliminate
this?  However, this may be only happened in short running tasks and
throttling already being helpful.


> Does that answer the question?


> >
> > Thanks!
> >
> > Tzu-Chiao
> >
> > On Sat, Aug 18, 2018 at 5:55 AM Tyson Norris 
> > wrote:
> >
> > > Ugh my reply formatting got removed!!! Trying this again with some >>
> > >
> > > On Aug 17, 2018, at 2:45 PM, Tyson Norris  > > <mailto:tnor...@adobe.com.INVALID>> wrote:
> > >
> > >
> > > If the failover of the singleton is too long (I think it will be based
> on
> > > cluster size, oldest node becomes the singleton host iirc), I think we
> > need
> > > to consider how containers can launch in the meantime. A first step
> might
> > > be to test out the singleton behavior in the cluster of various sizes.
> > >
> > >
> > > I agree this bit of design is crucial, a few thoughts:
> > > Pre-warm wouldn't help here, the ContainerRouters only know warm
> > > containers. Pre-warming is managed by the ContainerManager.
> > >
> > >
> > > >> Ah right
> > >
> > >
> > >
> > > Considering a fail-over scenario: We could consider sharing the state
> via
> > > EventSourcing. That is: All state lives inside

Re: Proposing Lean OpenWhisk

2018-07-18 Thread TzuChiao Yeh
Hi David,

Definitely make sense :) We may have alternative option (i.e. "native"
function with kind of limitation) after realizing more detail on it. I
agree that moving to edge required long term process due to uncertainty.

Anyway, AFAIK, plenty of industries and academics rely on openwhisk to
explore edge computing cases. Looking forward to this getting merged!

On Wed, Jul 18, 2018 at 7:43 PM David Breitgand  wrote:

> Hi Tzu,
>
> You are right about GreenGrass. AFAIK, they are not using Docker in their
> solution. BTW, this brings about some limitations: e.g., they run Python
> lambdas in GreenGrass, while OpenWhisk at the edge will be able to run any
> container, just like it happens in the cloud, which makes it a polyglot
> capability.
>
> Azure Functions on IoT Edge uses containers. So, the approaches differ :)
> In general, I agree: containers are there for isolation. If edge is viewed
> as a cloud extension, then a typical use case might be migrating user's
> containers from the cloud to edge to save bandwidth, for example. This
> includes migrating a serverless workload to the edge more or less as is.
> So, at the moment we just want to lay a first brick to enable this.
>
> Concerning the cold start, I agree that this is a problem and it's more
> pronounced in the edge than in the cloud. But if we ignore this problem
> for a moment, we still get two benefits (out of 3 that you emphasize):
> autonomy and lower bandwidth by just allowing OW to run at the edge.
>
> I agree that considering alternatives to containers when putting
> serverless at the edge makes a lot of sense in the long run (or maybe even
> medium term) and will be happy to discuss this.
>
> Cheers.
>
>
> -- david
>
>
>
>
> From:   TzuChiao Yeh 
> To: dev@openwhisk.apache.org
> Date:   17/07/2018 05:49 PM
> Subject:Re: Proposing Lean OpenWhisk
>
>
>
> Hi David,
>
> Looks cool! Glad to see OpenWhisk step forward to the edge use case.
>
> Simple question: have you considered the way that remove out docker
> containers (break up isolation)?
>
> Due to close-source, I'm not sure how aws greengrass did, but seems like
> there's no docker got installed at all.
>
> The edge computing benefits for some advantages,
> 1. bandwidth reduction.
> 2. lower latency.
> 3. offline computing capability (not for all scenarios, but this is indeed
> aws greengrass claimed for).
>
> We can first ignore the use cases that required ultra low-latency (i.e.
> interactive AR/VR, speech language translation). But for general use
> cases,
> cold start problem in serverless makes low latency no sense. Since there's
> only about 100-200ms RTT from device to cloud, but container
> creation/deletion is much higher. Besides from this, (part of) edge
> devices
> are not provided as an IaaS service, therefore we can even care no
> multi-tenancy or weaker isolation. What do you think?
>
> Thanks,
> Tzu-Chiao Yeh (@tz70s)
>
>
> On Tue, Jul 17, 2018 at 9:43 PM David Breitgand 
> wrote:
>
> > Sure. Will do directly on Wiki.
> > Cheers.
> >
> > -- david
> >
> >
> >
> >
> > From:   "Markus Thoemmes" 
> > To: dev@openwhisk.apache.org
> > Date:   17/07/2018 04:31 PM
> > Subject:Re: Proposing Lean OpenWhisk
> >
> >
> >
> > Hi David,
> >
> > I absolutely agree, this should not be held back. It'd be great if you
> > could chime in on the discussion I opened on the new proposal regarding
> > your use-case though. It might be nice to verify a similar topology as
> you
> > are proposing is still implementable or maybe even easier to implement
> > when moving to a new architecture, just so we have all requirements to
> it
> > on the table.
> >
> > I agree it's entirely orthogonal though and your proposal can be
> > implemented/merged independent of that.
> >
> > Cheers,
> > Markus
> >
> >
> >
> >
> >
> >
>
>
>
>
>

-- 
Tzu-Chiao Yeh (@tz70s)


Re: Proposing Lean OpenWhisk

2018-07-17 Thread TzuChiao Yeh
Hi David,

Looks cool! Glad to see OpenWhisk step forward to the edge use case.

Simple question: have you considered the way that remove out docker
containers (break up isolation)?

Due to close-source, I'm not sure how aws greengrass did, but seems like
there's no docker got installed at all.

The edge computing benefits for some advantages,
1. bandwidth reduction.
2. lower latency.
3. offline computing capability (not for all scenarios, but this is indeed
aws greengrass claimed for).

We can first ignore the use cases that required ultra low-latency (i.e.
interactive AR/VR, speech language translation). But for general use cases,
cold start problem in serverless makes low latency no sense. Since there's
only about 100-200ms RTT from device to cloud, but container
creation/deletion is much higher. Besides from this, (part of) edge devices
are not provided as an IaaS service, therefore we can even care no
multi-tenancy or weaker isolation. What do you think?

Thanks,
Tzu-Chiao Yeh (@tz70s)


On Tue, Jul 17, 2018 at 9:43 PM David Breitgand  wrote:

> Sure. Will do directly on Wiki.
> Cheers.
>
> -- david
>
>
>
>
> From:   "Markus Thoemmes" 
> To: dev@openwhisk.apache.org
> Date:   17/07/2018 04:31 PM
> Subject:Re: Proposing Lean OpenWhisk
>
>
>
> Hi David,
>
> I absolutely agree, this should not be held back. It'd be great if you
> could chime in on the discussion I opened on the new proposal regarding
> your use-case though. It might be nice to verify a similar topology as you
> are proposing is still implementable or maybe even easier to implement
> when moving to a new architecture, just so we have all requirements to it
> on the table.
>
> I agree it's entirely orthogonal though and your proposal can be
> implemented/merged independent of that.
>
> Cheers,
> Markus
>
>
>
>
>
>


Re: Proposal on a future architecture of OpenWhisk

2018-07-17 Thread TzuChiao Yeh
Hi Markus,

Yes, I agree that storing activation records should be a separated
discussion. Pipe activation records into logging system (elasticsearch,
kibana) will be cool!

But I think I'm not asking these now though, however, thanks for pointing
these out, looks interesting.

I think I got some misunderstanding. Originally, I considered some edge
cases once invoker got failed during responding back with active-ack, but
there's no recovery/retry logic from now (therefore so-called best-effort).
However, whether supporting stronger execution guarantee may not be
discussed here now, but this indeed will be different mechanism if we
bypassing Kafka or not.

Thanks for answering me anyway,
Tzuchiao


On Tue, Jul 17, 2018 at 4:49 PM Markus Thoemmes 
wrote:

> Hi Tzu-Chiao,
>
> great questions although I'd relay those into a seperate discussion. The
> design proposed does not intent to change the way we provide
> oberservibility via persisting activation records. The controller takes
> that responsibility in the design.
>
> It is fair to open a discussion on what our plans for the activation
> record itself are though, in the future. There is a lot of work going on in
> that area currently, with Vadim implementing user-facing metrics (which can
> serve of part of what activation records do) and James implementing
> different ActivationStores with the intention to eventually moving
> activation records to the logging system.
>
> Another angle here is that both of these solutions drop persistence of the
> activation result by default, since it is potentially a large blob.
> Persisting logs into CouchDB doesn't really scale either so there are a
> couple of LogStores to shift that burden away. What remains is largely a
> small, bounded record of some metrics per activation. I'll be happy to see
> a separate proposal + discussion on where we want to take this in the
> future :)
>
> Cheers,
> Markus
>
>


Re: Proposal on a future architecture of OpenWhisk

2018-07-17 Thread TzuChiao Yeh
Hi Markus,

Awesome work! Thanks for doing this.

One simple question here: due to directly call actions via http calls, do
we still persist activation (i.e. duplicate activations into somewhere
storage)? Since we already provide "best-effort" invocation for users, not
sure persistence is still worth-doing. Or maybe we can provide some
guarantee options in the future?

Thanks,
Tzu-Chiao Yeh (@tz70s)


On Tue, Jul 17, 2018 at 12:42 AM Markus Thoemmes 
wrote:

> Hi Chetan,
>
> > Hi Thomas,
>
> It's Markus Thömmes/Thoemmes respectively :)
>
> > Is this routing round robin for per namespace + action name url or is
> > it for any url? For e.g. if we have controller c1-c3 and request come
> > in order a1,a2,a3, a1 which controller would be handling which action
> > here?
>
> It's for any URL. I'm not sure the general front-door (nginx in our case)
> supports keyed round-robin/least-connected. For sanity, I basically assume
> that every request can land on any controller with no control of how that
> might happen.
>
> Cheers,
> Markus
>
>


Re: Limit of binary actions

2018-07-06 Thread TzuChiao Yeh
Hi Christian and Carlos,

>From my past experiment, there might still exist some hard-coded limitation
in action runtime and couchdb https request size. If we allow to fully
configure the action size limit.
I've tried the similar approach at
https://github.com/apache/incubator-openwhisk/pull/3757
But I think you've done this better, I'll close mine.

Thanks,
Tzu-Chiao

Carlos Santana  於 2018年7月6日 週五 下午7:43寫道:

> Thanks Christian
>
> +2
> This PR in addition of fixing the bug it also adds an improvement that the
> limit size is not longer hardcoded and is now a configuration setting that
> the operator deploying can customize.
>
> - Carlos Santana
> @csantanapr
>
> > On Jul 6, 2018, at 7:44 AM, Christian Bickel  wrote:
> >
> > Hey,
> >
> > a few days a go we found a little bug:
> >
> > On creating an action, the code is sent to the controller. Afterwards
> > the controller checks, that the code is not too big. If the code
> > itself is sent directly (e.g. uploading a single `.js` file), the
> > limit is 48MB (today).
> > If an archive is uploaded, it will be encoded with base64. The problem
> > here is, that base64 has an overhead of 1/3. This means, if you have
> > an archive, that has a size of e.g. 45MB, 60MB will be sent to the
> > controller. So the request will be rejected. This is not expected for
> > the user, as a limit of 48MB was proposed to him.
> >
> > I opened a PR to fix this behaviour:
> > https://github.com/apache/incubator-openwhisk/pull/3835
> >
> > As this change, potentially raises the action size in the database, I
> > wanted to ask if anyone has any concerns with this PR.
> >
> > If not, I will go forward with it.
> >
> > Greetings
>


Re: New scheduling algorithm proposal.

2018-06-24 Thread TzuChiao Yeh
 by independent schedulers.

5.Hybrid: Combining distributed architecture with monolithic or
shared-state designs.



I’m still studying on some academic references and map to OW and Serverless
architecture. Since I’ve no experience and not that familiar with large
scale distributed cluster scheduling (and no resource, either). Just try to
sort out my thoughts, please let me know if I’m understanding wrong. I
think there’s already some folks had much experiences and research on large
scale distributed cluster scheduling, that would be nice if we can discuss
these more.


Thanks!

TzuChiao


[1] http://firmament.io/blog/scheduler-architectures.html




Dominic Kim  於 2018年6月9日 週六 上午12:24寫道:

> Hi TzuChiao
>
> Those are great and fair comments!
>
> 1. Kafka utilization.
>
> Regarding what is written in link[1], it's correct.
> More partitions lead to higher throughput and latency.
>
> But the number of partitions in one Kafka node is limited to some level in
> my proposal.
> It does not increase infinitely as we would add more servers to support
> more concurrent actions.
> As I shared in my previous email, all partitions(no matter of topic) would
> be evenly distributed among nodes.
> So overhead from multiple partitions can be limited in one node, and
> topic-wise overhead can be distributed among multiple nodes.
>
> With regard to your question on batch processing, yes, current MessageFeed
> fetches messages in batch.
> But inherently activation processing can't be done in batch.
> Even though it fetches a bunch of messages, invoker should(and does) handle
> activation in serial order(concurrently).
> Because if they commit offsets in batch, there is a possibility to invoke
> actions multiple times in case of failure.
> In turn, committing offset is done one by one.
>
> If you are only mentioning about fetching multiple messages, that can be
> achieved in my proposal as well because consumers are dedicated to a given
> topic and it's safe to fetch multiple messages.
> (However, committing offset still works in serial)
>
> Each consumers in a same group will be assigned one partition respectively.
> And they can fetch as many messages as they want if they commit offsets one
> by one.
>
> 2. Resource-aware scheduling.
>
> Actually, I think optimal scheduling based on real resource usage is not
> feasible in a current architecture.
> This is because real resource resides in invoker side, but scheduling
> decision is made by controllers.
> And resource status can change in 5ms ~ 10ms as the execution time of
> action can be very less.
> (I observed 2ms execution time for some actions.)
>
> So all invokers should share their extremely frequently changing status to
> all controllers,
> and a controller should schedule activations to optimal invokers along with
> considering other controllers.
> (Because there is a possibility that other controllers can also schedule
> activations to same the invoker.)
> And all these procedures should be done within lesser than 5ms.
>
> So I think there will always be some gap between real resource usages and
> status kept by controllers.
> And this is the reason why I made a proposal which utilizes an asynchronous
> way.
>
> Regarding sharding concept at loadbalancer, I just refer to it to minimize
> the intervention among controllers when scheduling.
> Since scheduling for container creation and activations are segregated, and
> container creation process can work in an asynchronous way, I think that
> would be enough.
> If there is a better way to handle this, that would be better.
>
> Finally, regarding the word, autonomous, I just named it because a
> container itself can fetch and handle activation messages without any
> intervention of invokers : )
>
> Anyway, thank you for very valuable comments.
> I hope this would help.
>
> Best regards
> Dominic.
>
>
>
>
> 2018-06-07 21:55 GMT+09:00 TzuChiao Yeh :
>
> > Hi Dominic,
> >
> > I really like your proposal! Thanks for your awesome presentation and
> > materials, help me a lot.
> >
> > I have some opinions and questions here about the proposal and previous
> > discussions:
> >
> > 1. About kafka utilization:
> >
> > First of all, bypassing invokers is a great idea, though this will lead
> to
> > lots of hard work on various runtime. The possibility on utilizing "hot"
> > containers and parallelism also looks well.
> >
> > I'm not a kafka expert at all, but there are some external references
> > talking about large number of partitions. [1]
> >
> > One question here -
> >
> > From my own investigation, current implementation on messaging (consume

Re: New scheduling algorithm proposal.

2018-06-07 Thread TzuChiao Yeh
Hi Dominic,

I really like your proposal! Thanks for your awesome presentation and
materials, help me a lot.

I have some opinions and questions here about the proposal and previous
discussions:

1. About kafka utilization:

First of all, bypassing invokers is a great idea, though this will lead to
lots of hard work on various runtime. The possibility on utilizing "hot"
containers and parallelism also looks well.

I'm not a kafka expert at all, but there are some external references
talking about large number of partitions. [1]

One question here -

>From my own investigation, current implementation on messaging (consumer)
uses batch (offset) to enhance performance. Since your proposed algorithm
assign each topic to an action, more accurately speaking, each partition to
a running container. It might drop the capability on using batch and I'm
not sure that how much overhead will take? Or there might be some advanced
design for balancing number of partition and offset?

2. Controller side:

I pay more interests on the future enhancement for extending more states
(from invokers) for controllers. I.e. the more accurate resource-aware
scheduling (i.e. memory-aware that Markus proposed in previous dev list),
package aware scheduling for package caching [2] and control the running
container's reserved time that you've mentioned ---> warm/hot containers
utilization.

At the first time, I'm confused of the "autonomous" word maps to your
algorithm. I think the scheduling logic still sit in controller side (for
checking lags, limits and so on).
Seems like the proposed algorithm will not drop out the current sharding
loadbalancer, can you share more experience or plans on integrating these
in controller side?

[1]
https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
[2] https://dl.acm.org/citation.cfm?id=3186294=ACM=DL

Thanks,
TzuChiao

Dominic Kim  於 2018年6月7日 週四 上午1:08寫道:

> Sorry.
> Let me share Kafka benchmark results again.
>
> | # of topics  |   Kafka TPS |
> |50  |   34,488 |
> |  100  |  34,502 |
> |  200  |   31,781 |
> |  500  |   30,324 |
> | 1000  |  30,855 |
>
> Best regards
> Dominic
>
>
> 2018-06-07 2:04 GMT+09:00 Dominic Kim :
>
> > Sorry for the late response Tyson.
> >
> > Let me first answer your second question.
> > Vuser is just the number of threads to send the requests.
> > Each Vusers randomly picked the namespace and send the request using REST
> > API.
> > So they are independent of the number of namespaces.
> >
> > And regarding performance degradation on the number of users, I think it
> > works a little bit differently.
> > Even though I have only 2 users(namespaces), if their homeInvoker is
> same,
> > TPS become very less.
> > So it is a matter of the number of actions whose homeInvoker are same
> > though more the number of users than the number of containers can harm
> the
> > performance.
> > This is because controller should send those actions to the same invoker
> > even though there are other idle invokers.
> > In my proposal, controllers can schedule activation to any invokers so it
> > does not happen.
> >
> >
> > And regarding the issue about the sheer number of Kafka topics, let me
> > share my idea.
> >
> > 1. Data size is not changed.
> >
> > If we have 1000 activation requests, they will be spread among invoker
> > topics. Let's say we have 10 invokers, then ideally each topic will have
> > 100 messages.
> > In my proposal, if I have 10 actions, each topic will have 100 messages
> as
> > well.
> > Surely there will be more number of actions than the number of invokers,
> > data will be spread to more topics, but data size is unchanged.
> >
> > 2. Data size depends on the number of active actions.
> >
> > For example, if we have one million actions, in turn, one million topics
> > in Kafka.
> > If only half of them are executed, then there will be data only for half
> > of them.
> > For rest half of topics, there will be no data and they won't affect the
> > performance.
> >
> > 3. Things to concern.
> >
> > Let me describe what happens if there are more number of Kafka topics.
> >
> > Let's say there are 3 invokers with 5 activations each in the current
> > implementation, then it would look like this.
> >
> > invoker0: 0 1 2 3 4 5 (5 messages) -> consumer0
> > invoker1: 0 1 2 3 4 5 -> consumer1
> > invoekr2: 0 1 2 3 4 5 -> consumer2
> >
> > Now If I have 15 actions with 15 topics in my proposal.
> >
> > action0: 0  -> consumer0
> > action1: 0  -> consumer1
> > action2: 0  -> consumer2
> > action3: 0  -> consumer3
> > .
> > .
> > .
> > action14: 0  -> consumer14
> >
> > Kafka utilizes page cache to maximize the performance.
> > Since the size of data is not changed, data kept in page cache is also
> not
> > changed.
> > But the number of parallel access to data is increased. I think it might
> > be some overhead.
> >
> > That's the reason why I performed benchmark with multiple topics.
> >
> > # of topics
> >
> > Kafka TPS
> >