On Wed, Aug 16, 2017 at 9:33 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote: >> What I proposed is having a single request to YARN to get all applications' > statuses, if that's possible. You'd still have multiple application handles > that are independent of each other. They'd all be updated separately from > that one thread talking to YARN. This has nothing to do with a "shared data > structure". There's no shared data structure here to track application > status. > > You are still avoiding the questions how you make all "application handles" > accessible to this thread
I really don't understand what you mean. You need somewhere to keep the application handles you're monitoring regarding of the solution. The code making the YARN request needs to somehow update those handles. Whether there's a task per handle that is submitted to a thread pool, or some map or list tracking all available handles that are then updated by the single thread talking to YARN, it doesn't matter. In the first case your thread pool is the "shared data structure", in the second case this map of handles is the "shared data structure", so I don't understand why you think there is any difference. I'm proposing a different approach that I'm pretty sure is easier on YARN, which is a shared service that we should be trying not to unnecessarily overload. The least I'd expect is for you to consider the suggestion and actually explain why it wouldn't work, but so far you've just been deflecting feedback. You can, for example, see if such a bulk API exists and reply "I couldn't find it". I believe it must exist, after all I can go to the RM web UI and see all applications, and get a list of them from the YARN REST API. But if it doesn't exist, that would take care of my suggestion. > "I would investigate whether there's any API in YARN to do a bulk get of > running applications with a particular filter;" - from your email > > If you suggest something, please find evidence to support you "I would investigate" is a suggestion that you investigate that as part of proposing your change. It's not me saying that I'll do it myself (that would be "I will investigate"). >> What if YARN goes down? What if your datacenter has a massive power > failure? You have to handle errors in any scenario. > > Again, I am describing one concrete scenario which is always involved in > any bulk operation and even we go to bulk direction, you have to handle > this. Since you proposed this bulk operation, I am asking you what's your > expectation about this. I'm expecting that errors be handled regardless of the situation. If YARN returns an error to you, regardless of whether it was a request for a single application status or for a bunch of them, your code needs to handle it somehow. The handling will most probably be the same in both cases (retry), and that's my point. -- Marcelo