PDF version

On Tue, Aug 15, 2017 at 2:22 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote:

> I also attached the discarded version of design here
>
> Best,
>
> Nan
>
> On Tue, Aug 15, 2017 at 2:20 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
>
>> Hi, Marcelo,
>>
>> Yes, essentially it is using multiple threads talking with YARN.
>>
>> The key design consideration here is that how you model the state of
>> applications, if in actor, then there will be no synchronization involved
>> and yielding a cleaner design; if in a shared data structure, you will have
>> to be careful about coordinating threads here (we actually have a design
>> based on shared data structure and we eventually discard to pursue a
>> cleaner one).
>>
>> I think bulk API can make life easier comparing to the shared data
>> structure, but it raises up two questions
>>
>> 1. Are we going to update all applications in the uniform pace, even they
>> are submitted in different time?
>>
>> 2. Are we going to use a single thread for everything, including
>> send/recv req/res and parse, etc.
>>
>> and we still need to deal with some synchronization,
>>
>> What do you think?
>>
>> Best,
>>
>> Nan
>>
>>
>>
>>
>>
>>
>> On Tue, Aug 15, 2017 at 11:53 AM, Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>>
>>> Hmm, I remember this... it was left as a "todo" item when the app
>>> monitoring was added.
>>>
>>> The document you wrote seems to be a long way of saying you'll have a
>>> few threads talking to YARN and updating the state of application
>>> handles in Livy. Is that right?
>>>
>>> I would investigate whether there's any API in YARN to do a bulk get
>>> of running applications with a particular filter; then you could make
>>> a single call to YARN periodically to get the state of all apps that
>>> Livy started.
>>>
>>>
>>> On Mon, Aug 14, 2017 at 2:35 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
>>> > Hi, all
>>> >
>>> > In HDInsight, we (Microsoft) use Livy as the Spark job submission
>>> service.
>>> > We keep seeing the customers fall into the problem when they submit
>>> many
>>> > concurrent applications to the system, or recover livy from a state
>>> with
>>> > many concurrent applications
>>> >
>>> > By looking at the code and the customers' exception stack, we lock
>>> down the
>>> > problem to the application monitoring module where a new thread is
>>> created
>>> > for each application.
>>> >
>>> > To resolve the issue, we propose a actor-based design of application
>>> > monitoring module and share it here (as new JIRA seems not working
>>> > yet) *https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P
>>> -nbTQTdDFXl2XQhXDiwA/edit?usp=sharing
>>> > <https://docs.google.com/document/d/1yDl5_3wPuzyGyFmSOzxRp6P
>>> -nbTQTdDFXl2XQhXDiwA/edit?usp=sharing>*
>>> >
>>> > We are glad to hear feedbacks from the community and improve the design
>>> > before we start implementing it!
>>> >
>>> > Best,
>>> >
>>> > Nan
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>

Reply via email to