PDF version On Tue, Aug 15, 2017 at 2:22 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
> I also attached the discarded version of design here > > Best, > > Nan > > On Tue, Aug 15, 2017 at 2:20 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > >> Hi, Marcelo, >> >> Yes, essentially it is using multiple threads talking with YARN. >> >> The key design consideration here is that how you model the state of >> applications, if in actor, then there will be no synchronization involved >> and yielding a cleaner design; if in a shared data structure, you will have >> to be careful about coordinating threads here (we actually have a design >> based on shared data structure and we eventually discard to pursue a >> cleaner one). >> >> I think bulk API can make life easier comparing to the shared data >> structure, but it raises up two questions >> >> 1. Are we going to update all applications in the uniform pace, even they >> are submitted in different time? >> >> 2. Are we going to use a single thread for everything, including >> send/recv req/res and parse, etc. >> >> and we still need to deal with some synchronization, >> >> What do you think? >> >> Best, >> >> Nan >> >> >> >> >> >> >> On Tue, Aug 15, 2017 at 11:53 AM, Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >>> Hmm, I remember this... it was left as a "todo" item when the app >>> monitoring was added. >>> >>> The document you wrote seems to be a long way of saying you'll have a >>> few threads talking to YARN and updating the state of application >>> handles in Livy. Is that right? >>> >>> I would investigate whether there's any API in YARN to do a bulk get >>> of running applications with a particular filter; then you could make >>> a single call to YARN periodically to get the state of all apps that >>> Livy started. >>> >>> >>> On Mon, Aug 14, 2017 at 2:35 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: >>> > Hi, all >>> > >>> > In HDInsight, we (Microsoft) use Livy as the Spark job submission >>> service. >>> > We keep seeing the customers fall into the problem when they submit >>> many >>> > concurrent applications to the system, or recover livy from a state >>> with >>> > many concurrent applications >>> > >>> > By looking at the code and the customers' exception stack, we lock >>> down the >>> > problem to the application monitoring module where a new thread is >>> created >>> > for each application. >>> > >>> > To resolve the issue, we propose a actor-based design of application >>> > monitoring module and share it here (as new JIRA seems not working >>> > yet) *https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P >>> -nbTQTdDFXl2XQhXDiwA/edit?usp=sharing >>> > <https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P >>> -nbTQTdDFXl2XQhXDiwA/edit?usp=sharing>* >>> > >>> > We are glad to hear feedbacks from the community and improve the design >>> > before we start implementing it! >>> > >>> > Best, >>> > >>> > Nan >>> >>> >>> >>> -- >>> Marcelo >>> >> >> >