On Wed, Aug 16, 2017 at 9:33 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
>> What I proposed is having a single request to YARN to get all applications'
> statuses, if that's possible. You'd still have multiple application handles
> that are independent of each other. They'd all be updated separately from
> that one thread talking to YARN. This has nothing to do with a "shared data
> structure". There's no shared data structure here to track application
> status.
>
> You are still avoiding the questions how you make all "application handles"
> accessible to this thread

I really don't understand what you mean. You need somewhere to keep
the application handles you're monitoring regarding of the solution.
The code making the YARN request needs to somehow update those
handles. Whether there's a task per handle that is submitted to a
thread pool, or some map or list tracking all available handles that
are then updated by the single thread talking to YARN, it doesn't
matter.

In the first case your thread pool is the "shared data structure", in
the second case this map of handles is the "shared data structure", so
I don't understand why you think there is any difference.

I'm proposing a different approach that I'm pretty sure is easier on
YARN, which is a shared service that we should be trying not to
unnecessarily overload. The least I'd expect is for you to consider
the suggestion and actually explain why it wouldn't work, but so far
you've just been deflecting feedback.

You can, for example, see if such a bulk API exists and reply "I
couldn't find it". I believe it must exist, after all I can go to the
RM web UI and see all applications, and get a list of them from the
YARN REST API. But if it doesn't exist, that would take care of my
suggestion.

> "I would investigate whether there's any API in YARN to do a bulk get of
> running applications with a particular filter;" - from your email
>
> If you suggest something, please find evidence to support you

"I would investigate" is a suggestion that you investigate that as
part of proposing your change. It's not me saying that I'll do it
myself (that would be "I will investigate").

>> What if YARN goes down? What if your datacenter has a massive power
> failure? You have to handle errors in any scenario.
>
> Again, I am describing one concrete scenario which is always involved in
> any bulk operation and even we go to bulk direction, you have to handle
> this. Since you proposed this bulk operation, I am asking you what's your
> expectation about this.

I'm expecting that errors be handled regardless of the situation. If
YARN returns an error to you, regardless of whether it was a request
for a single application status or for a bunch of them, your code
needs to handle it somehow. The handling will most probably be the
same in both cases (retry), and that's my point.

-- 
Marcelo

Reply via email to