Patrick
Thanks for responding. Yes. many of are features requests not private
client related. These are the things I have been working with since last year.
I have trying to push the PR for these changes. If the new Launcher lib is
the way to go , we will try to work with new APIs.
Thanks
Chester
Sent from my iPhone
> On May 13, 2015, at 7:22 PM, Patrick Wendell <[email protected]> wrote:
>
> Hey Chester,
>
> Thanks for sending this. It's very helpful to have this list.
>
> The reason we made the Client API private was that it was never
> intended to be used by third parties programmatically and we don't
> intend to support it in its current form as a stable API. We thought
> the fact that it was for internal use would be obvious since it
> accepts arguments as a string array of CL args. It was always intended
> for command line use and the stable API was the command line.
>
> When we migrated the Launcher library we figured we covered most of
> the use cases in the off chance someone was using the Client. It
> appears we regressed one feature which was a clean way to get the app
> ID.
>
> The items you list here 2-6 all seem like new feature requests rather
> than a regression caused by us making that API private.
>
> I think the way to move forward is for someone to design a proper
> long-term stable API for the things you mentioned here. That could
> either be by extension of the Launcher library. Marcelo would be
> natural to help with this effort since he was heavily involved in both
> YARN support and the launcher. So I'm curious to hear his opinion on
> how best to move forward.
>
> I do see how apps that run Spark would benefit of having a control
> plane for querying status, both on YARN and elsewhere.
>
> - Patrick
>
>> On Wed, May 13, 2015 at 5:44 AM, Chester At Work <[email protected]>
>> wrote:
>> Patrick
>> There are several things we need, some of them already mentioned in the
>> mailing list before.
>>
>> I haven't looked at the SparkLauncher code, but here are few things we need
>> from our perspectives for Spark Yarn Client
>>
>> 1) client should not be private ( unless alternative is provided) so we
>> can call it directly.
>> 2) we need a way to stop the running yarn app programmatically ( the PR
>> is already submitted)
>> 3) before we start the spark job, we should have a call back to the
>> application, which will provide the yarn container capacity (number of cores
>> and max memory ), so spark program will not set values beyond max values (PR
>> submitted)
>> 4) call back could be in form of yarn app listeners, which call back
>> based on yarn status changes ( start, in progress, failure, complete etc),
>> application can react based on these events in PR)
>>
>> 5) yarn client passing arguments to spark program in the form of main
>> program, we had experience problems when we pass a very large argument due
>> the length limit. For example, we use json to serialize the argument and
>> encoded, then parse them as argument. For wide columns datasets, we will run
>> into limit. Therefore, an alternative way of passing additional larger
>> argument is needed. We are experimenting with passing the args via a
>> established akka messaging channel.
>>
>> 6) spark yarn client in yarn-cluster mode right now is essentially a
>> batch job with no communication once it launched. Need to establish the
>> communication channel so that logs, errors, status updates, progress bars,
>> execution stages etc can be displayed on the application side. We added an
>> akka communication channel for this (working on PR ).
>>
>> Combined with others items in this list, we are able to redirect print
>> and error statement to application log (outside of the hadoop cluster), so
>> spark UI equivalent progress bar via spark listener. We can show yarn
>> progress via yarn app listener before spark started; and status can be
>> updated during job execution.
>>
>> We are also experimenting with long running job with additional spark
>> commands and interactions via this channel.
>>
>>
>> Chester
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Sent from my iPad
>>
>>> On May 12, 2015, at 20:54, Patrick Wendell <[email protected]> wrote:
>>>
>>> Hey Kevin and Ron,
>>>
>>> So is the main shortcoming of the launcher library the inability to
>>> get an app ID back from YARN? Or are there other issues here that
>>> fundamentally regress things for you.
>>>
>>> It seems like adding a way to get back the appID would be a reasonable
>>> addition to the launcher.
>>>
>>> - Patrick
>>>
>>>> On Tue, May 12, 2015 at 12:51 PM, Marcelo Vanzin <[email protected]>
>>>> wrote:
>>>> On Tue, May 12, 2015 at 11:34 AM, Kevin Markey <[email protected]>
>>>> wrote:
>>>>
>>>>> I understand that SparkLauncher was supposed to address these issues, but
>>>>> it really doesn't. Yarn already provides indirection and an arm's length
>>>>> transaction for starting Spark on a cluster. The launcher introduces yet
>>>>> another layer of indirection and dissociates the Yarn Client from the
>>>>> application that launches it.
>>>>
>>>> Well, not fully. The launcher was supposed to solve "how to launch a Spark
>>>> app programatically", but in the first version nothing was added to
>>>> actually gather information about the running app. It's also limited in the
>>>> way it works because of Spark's limitations (one context per JVM, etc).
>>>>
>>>> Still, adding things like this is something that is definitely in the scope
>>>> for the launcher library; information such as app id can be useful for the
>>>> code launching the app, not just in yarn mode. We just have to find a clean
>>>> way to provide that information to the caller.
>>>>
>>>>
>>>>> I am still reading the newest code, and we are still researching options
>>>>> to move forward. If there are alternatives, we'd like to know.
>>>> Super hacky, but if you launch Spark as a child process you could parse the
>>>> stderr and get the app ID.
>>>>
>>>> --
>>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]