Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Christie, Marcus Aaron Fri, 03 Feb 2017 06:22:53 -0800

Vidya,

I’m not sure how relevant it is, but it occurs to me that a microservice that 
executes jobs on a cloud requires very little in terms of resources to submit 
and monitor that job on the cloud. It doesn’t really matter if the job is a 
“big” or a “small” job.  So I’m not sure what heuristic makes sense regarding 
distributing work to these job execution microservices.  Maybe a simple round 
robin approach would be sufficient.


I think a job scheduling algorithm does make sense, however, for a higher level 
component, some sort of metascheduler that understands what resources are 
available on the cloud resources on which the jobs will be running.  The 
metascheduler could create work for the job exection microservices to run on 
particular cloud resources in a way that optimizes for some metric (e.g., 
throughput).

Thanks,

Marcus

On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta 
<vkalv...@umail.iu.edu<mailto:vkalv...@umail.iu.edu>> wrote:

Ajinkya,

My scenario is for workload distribution among multiple instances of the same 
microservice.

If a message broker needs to distribute the available jobs among multiple 
workers, the common approach would be to use round robin or a similar 
algorithm. This approach works best when all the workers are similar and the 
jobs are equal.

So I think that a genetic or heuristic job scheduling algorithm, which is also 
aware of each of the worker's current state (CPU, RAM, No of Jobs processing) 
can more efficiently distribute the jobs. The workers can periodically ping the 
message broker with their current state info.

The other advantage of using a customized algorithm is that it can be tweaked 
to use embedded routing, priority or other information in the job metadata to 
resolve all of the concerns raised by Amrutha viz message grouping, ordering, 
repeated messages, etc.

We can even ensure data privacy, i.e if the workers are spread across multiple 
compute clusters say AWS and IU Big Red and we want to restrict certain 
sensitive jobs to be run only on Big Red.

Some distributed job scheduling algorithms for cloud computing.

  *   
http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf
  *   https://arxiv.org/pdf/1404.5528.pdf


Regards
Vidya Sagar

On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath 
<arka...@indiana.edu<mailto:arka...@indiana.edu>> wrote:

Hello all,


Adding more information to the message based approach. Messaging is a key 
strategy employed in many distributed environments. Message queuing is ideally 
suited to performing asynchronous operations. A sender can post a message to a 
queue, but it does not have to wait while the message is retrieved and 
processed. A sender and receiver do not even have to be running concurrently.


With message queuing there can be 2 possible scenarios:

  1.  Sending and receiving messages using a single message queue.
  2.  Sharing a message queue between many senders and receivers

When a message is retrieved, it is removed from the queue. A message queue may 
also support message peeking. This mechanism can be useful if several receivers 
are retrieving messages from the same queue, but each receiver only wishes to 
handle specific messages. The receiver can examine the message it has peeked, 
and decide whether to retrieve the message (which removes it from the queue) or 
leave it on the queue for another receiver to handle.


A few basic message queuing patterns are:

  1.  One-way messaging: The sender simply posts a message to the queue in the 
expectation that a receiver will retrieve it and process it at some point.
  2.  Request/response messaging: In this pattern a sender posts a message to a 
queue and expects a response from the receiver. The sender can resend if the 
message is not delivered. This pattern typically requires some form of 
correlation to enable the sender to determine which response message 
corresponds to which request sent to the receiver.
  3.  Broadcast messaging: In this pattern a sender posts a message to a queue, 
and multiple receivers can read a copy of the message. This pattern depends on 
the message queue being able to disseminate the same message to multiple 
receivers. There is a queue to which the senders can post messages that include 
metadata in the form of attributes. Each receiver can create a subscription to 
the queue, specifying a filter that examines the values of message attributes. 
Any messages posted to the queue with attribute values that match the filter 
are automatically forwarded to that subscription.

A solution based on asynchronous messaging might need to address a number of 
concerns:


Message ordering, Message grouping: Process messages either in the order they 
are posted or in a specific order based on priority. Also, there may be 
occasions when it is difficult to eliminate dependencies, and it may be 
necessary to group messages together so that they are all handled by the same 
receiver.
Idempotency: Ideally the message processing logic in a receiver should be 
idempotent so that, if the work performed is repeated, this repetition does not 
change the state of the system.
Repeated messages: Some message queuing systems implement duplicate message 
detection and removal based on message IDs
Poison messages: A poison message is a message that cannot be handled, often 
because it is malformed or contains unexpected information.
Message expiration: A message might have a limited lifetime, and if it is not 
processed within this period it might no longer be relevant and should be 
discarded.
Message scheduling: A message might be temporarily embargoed and should not be 
processed until a specific date and time. The message should not be available 
to a receiver until this time.


Thanks

Amruta Kamat




________________________________
From: Shenoy, Gourav Ganesh <goshe...@indiana.edu<mailto:goshe...@indiana.edu>>
Sent: Thursday, February 2, 2017 7:57 PM
To: dev@airavata.apache.org<mailto:dev@airavata.apache.org>

Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for 
Airavata

Hello all,

Amila, Sagar, thank you for the response and raising those concerns; and 
apologies because my email resonated the topic of workload management in terms 
of how micro-services communicate. As Ajinkya rightly mentioned, there exists 
some sort of correlation between micro-services communication and it’s impact 
on how that micro-service performs the work under those circumstances. The goal 
is to make sure we have maximum independence between micro-services, and 
investigate the workflow pattern in which these micro-services will operate 
such that we can find the right balance between availability & consistency. 
Again, from our preliminary analysis we can assert that these solutions may not 
be generic and the specific use-case will have a big decisive role.

For starters, we are focusing on the following example – and I think this will 
clarify the doubts on what we are exactly trying to investigate about.

Our test example
Say we have the following 4 micro-services, which each perform a specific task 
as mentioned in the box.

<image001.png>


A state-full pattern to distribute work
<image002.png>

Here each communication between micro-services could be via RPC or Messaging 
(eg: RabbitMQ). Obvious disadvantage is that if any micro-service is down, then 
the system availability is at stake. In this test example, we can see that 
Microservice-A coordinates the work and maintains the state information.

A state-less pattern to distribute work

<image003.png>

Another purely asynchronous approach would be to associate message-queues with 
each micro-service, where each micro-service performs it’s task, submits a 
request (message on bus) to the next micro-service, and continues to process 
more requests. This ensures more availability, and perhaps we might need to 
handle corner cases for failures such as message broker down, or message loss, 
etc.

As mentioned, these are just a few proposals that we are planning to 
investigate via a prototype project. Inject corner cases/failures and try and 
find ways to handle these cases. I would love to hear more 
thoughts/questions/suggestions.

Thanks and Regards,
Gourav Shenoy

From: Ajinkya Dhamnaskar <adham...@umail.iu.edu<mailto:adham...@umail.iu.edu>>
Reply-To: "dev@airavata.apache.org<mailto:dev@airavata.apache.org>" 
<dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Date: Thursday, February 2, 2017 at 2:22 AM
To: "dev@airavata.apache.org<mailto:dev@airavata.apache.org>" 
<dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Subject: Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for 
Airavata

Hello all,

Just a heads up. Here the name Distributed workload management does not 
necessarily mean having different instances of a microservice and then 
distributing work among these instances.

Apparently, the problem is how to make each microservice work independently 
with concrete distributed communication infrastructure. So, think of it as a 
workflow where each microservice does its part of work and communicates (how? 
yet to be decided) output. The next underlying microservice identifies and 
picks up that output and takes it further towards the final outcome, having 
said that, the crux here is, none of the miscoservices need to worry about 
other miscoservices in a pipeline.

Vidya Sagar,
I completely second your opinion of having stateless miscoservices, in fact 
that is the key. With stateless miscroservices it is difficult to guarantee 
consistency in a system but it solves the availability problem to some extent. 
I would be interested to understand what do you mean by "an intelligent job 
scheduling algorithm, which receives real-time updates from the microservices 
with their current state information".

On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta 
<vkalv...@umail.iu.edu<mailto:vkalv...@umail.iu.edu>> wrote:

On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara 
<thejaka.am...@gmail.com<mailto:thejaka.am...@gmail.com>> wrote:
Hi Gourav,

Sorry, I did not understand your question. Specifically I am having trouble 
relating "work load management" to options you suggest (RPC, message based 
etc.).
So what exactly you mean by "workload management" ?
What is work in this context ?

Also, I did not understand what you meant by "the most efficient way". 
Efficient interms of what ? Are you looking at speed ?

As per your suggestions, it seems you are trying to find a way to communicate 
between micro services. RPC might be troublesome if you need to communicate 
with processes separated from a firewall.

Thanks
-Thejaka


On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh 
<goshe...@indiana.edu<mailto:goshe...@indiana.edu>> wrote:
Hello dev, arch,

As part of this Spring’17 Advanced Science Gateway Architecture course, we are 
working on trying to debate and find possible solutions to the issue of 
managing distributed workloads in Apache Airavata. This leads to the discussion 
of finding the most efficient way that different Airavata micro-services should 
communicate and distribute work, in such a way that:

1.       We maintain the ability to scale these micro-services whenever needed 
(autoscale perhaps?).

2.       Achieve fault tolerance.

3.       We can deploy these micro-services independently, or better in a 
containerized manner – keeping in mind the ability to use devops for deployment.


As of now the options we are exploring are:

1.       RPC based communication

2.       Message based – either master-worker, or work-queue, etc

3.       A combination of both these approaches


I am more inclined towards exploring the message based approach, but again 
there arises the possibility of handling limitations/corner cases of message 
broker such as downtimes (may be more). In my opinion, having asynchronous 
communication will help us achieve most of the above-mentioned points. Another 
debatable issue is making the micro-services implementation stateless, such 
that we do not have to pass the state information between micro-services.

I would love to hear any thoughts/suggestions/comments on this topic and open 
up a discussion via this mail thread. If there is anything that I have missed 
which is relevant to this issue, please let me know.

Thanks and Regards,
Gourav Shenoy


Hi Gourav,

Correct me if I'm wrong, but I think this is a case of the job shop scheduling 
problem, as we may have 'n' jobs of varying processing times and memory 
requirements, and we have 'm' microservices with possibly different computing 
and memory capacities, and we are trying to minimize the 
makespan<https://en.wikipedia.org/wiki/Makespan>.

For this use-case, I'm in favor a highly available and consistent message 
broker with an intelligent job scheduling algorithm, which receives real-time 
updates from the microservices with their current state information.

As for the state vs stateless implementation, I think that question depends on 
the functionality of a particular microservice. In a broad sense, the stateless 
implementation should be preferred as it will scale better horizontally.


Regards,
Vidya Sagar


--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and 
Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | 
vkalv...@iu.edu<mailto:vkalv...@iu.edu>



--
Thanks and regards,

Ajinkya Dhamnaskar
Student ID : 0003469679
Masters (CS)
+1 (812) 369- 5416<tel:(812)%20369-5416>



--
Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics and 
Computing | Indiana University Bloomington | (812) 691-5002<tel:8126915002> | 
vkalv...@iu.edu<mailto:vkalv...@iu.edu>

Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata

Reply via email to