Hi Hitesh, Are you expecting zookeeper instances to run in login nodes of the supercomputers? If yes, I am not very sure how feasible that is going to be. I guess Suresh can give more details (using obsolete GSI-SSH as an example).
Thanks -AJ On Thu, Feb 22, 2018 at 10:02 PM, Hitesh Kumar Dasika <hdas...@umail.iu.edu> wrote: > Hello Devs > > > Here is another document detailing the design specs of the problem > discussed in previous emails. This document also contains answers to the > questions asked by Dimuthu (Thank you!). > > Any corrections, criticism or appreciation will be greatly helpful. > > > > Google Doc Link: > https://docs.google.com/document/d/1zVP7SSelxaGwTTOHL70ZOy4Zl3egk > Gx6r7TqBMqzapg/edit?usp=sharing > > > > On Mon, Feb 12, 2018 at 9:58 AM, DImuthu Upeksha < > dimuthu.upeks...@gmail.com> wrote: > >> Hi Hithesh, >> >> This is overall a good design. I have few areas that need further >> clarification. >> >> 1. Basically this design support a one way communication. Airavata sends >> commands and agents execute that. But we have scenarios where agents should >> respond to the commands. For example Airavata sends a list files commands >> and agent should respond back with the list of the files. And there could >> be cases where respond is asynchronous so that airavata does not >> immediately get the response. How do you handle such scenarios? >> 2. When you are implementing queues in the external server, do you keep >> one queue per compute resource or do you utilize a single queue for all >> compute resources? >> 3. Can we have multiple external servers for high availability? If so how >> do you keep the coordination among multiple external servers? >> 4. Did you consider other queue implementations like Kafka? If so what is >> the advantage you get by using RabbitMQ over that? >> 5. We might have to write same agents in different languages (python, C, >> Java) depending on the support of the compute resource. Please verify that >> the client libraries that you use for queue interactions support that. >> 6. What is the process of registering or removing a compute resources >> from the intranet (creating or deleting queues) and who is responsible for >> that? >> >> Thanks >> Dimuthu >> >> On Thu, Feb 8, 2018 at 6:03 PM, Hitesh Kumar Dasika <hdas...@umail.iu.edu >> > wrote: >> >>> Dev, >>> >>> I am looking at a Mechanism which can be used to establish a >>> communicating Architecture between a set of *intranet* nodes in a >>> cluster and Airavata. >>> >>> *Problem Introduction:* >>> >>> There are some cases wherein a cluster or an HPC system contains nodes >>> or machines in the intranet and these cannot be accessed through the HPC >>> System's endpoints directly. But, these systems inside the intranet can >>> communicate with the external world or Internet. These machines are also >>> precious resources that can be used for Job Executions. Hence there needs >>> to be a proper architecture in place to make use of those resources. Here >>> is a brief architectural discussion on this particular Problem. >>> >>> >>> *Google Doc Link :* >>> https://docs.google.com/document/d/11I5mboZmI_D_IocP-CfjJiN >>> oD55qtVSLcpWGodAL0z0/edit?usp=sharing >>> >> >> >