Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
I put the design requirements and description in the commit comment. So I will close the PR. please refer the following commit https://github.com/AlpineNow/spark/commit/5b336bbfe92eabca7f4c20e5d49e51bb3721da4d On Mon, May 25, 2015 at 3:21 PM, Chester Chen ches...@alpinenow.com wrote: All,

Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
All, I have created a PR just for the purpose of helping document the use case, requirements and design. As it is unlikely to get merge in. So it only used to illustrate the problems we trying and solve and approaches we took. https://github.com/apache/spark/pull/6398 Hope this

Re: Change for submitting to yarn in 1.3.1

2015-05-22 Thread Marcelo Vanzin
Hi Kevin, One thing that might help you in the meantime, while we work on a better interface for all this... On Thu, May 21, 2015 at 5:21 PM, Kevin Markey kevin.mar...@oracle.com wrote: Making *yarn.Client* private has prevented us from moving from Spark 1.0.x to Spark 1.2 or 1.3 despite many

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Kevin Markey
This is an excellent discussion.  As mentioned in an earlier email, we agree with a number of Chester's suggestions, but we have yet other concerns.  I've researched this further in the past several days, and I've queried my team.  This email attempts to

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Marcelo Vanzin
Hi Kevin, I read through your e-mail and I see two main things you're talking about. - You want a public YARN Client class and don't really care about anything else. In you message you already mention why that's not a good idea. It's much better to have a standardized submission API. As you

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Nathan Kronenfeld
In researching and discussing these issues with Cloudera and others, we've been told that only one mechanism is supported for starting Spark jobs: the *spark-submit* scripts. Is this new? We've been submitting jobs directly from a programatically created spark context (instead of through

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Marcelo Vanzin
Hi Nathan, On Thu, May 21, 2015 at 7:30 PM, Nathan Kronenfeld nkronenfeld@uncharted.software wrote: In researching and discussing these issues with Cloudera and others, we've been told that only one mechanism is supported for starting Spark jobs: the *spark-submit* scripts. Is this new?

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Nathan Kronenfeld
Thanks, Marcelo Instantiating SparkContext directly works. Well, sorta: it has limitations. For example, see discussions about Spark not really liking multiple contexts in the same JVM. It also does not work in cluster deploy mode. That's fine - when one is doing something out of

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Koert Kuipers
we also launch jobs programmatically, both on standalone mode and yarn-client mode. in standalone mode it always worked, in yarn-client mode we ran into some issues and were forced to use spark-submit, but i still have on my todo list to move back to a normal java launch without spark-submit at

Re: Change for submitting to yarn in 1.3.1

2015-05-15 Thread Chester At Work
Marcelo Thanks for the comments. All my requirements are from our work over last year in yarn-cluster mode. So I am biased on the yarn side. It's true some of the task might be able accomplished with a separate yarn API call, the API just does not same to be that nature any more if

Re: Change for submitting to yarn in 1.3.1

2015-05-15 Thread Marcelo Vanzin
Hi Chester, Writing a design / requirements doc sounds great. One comment though: On Thu, May 14, 2015 at 11:18 PM, Chester At Work ches...@alpinenow.com wrote: For #5 yes, it's about the command line args. These are args are the input for the spark jobs. Seems a bit too much to create

Re: Change for submitting to yarn in 1.3.1

2015-05-14 Thread Marcelo Vanzin
Hi Chester, Thanks for the feedback. A few of those are great candidates for improvements to the launcher library. On Wed, May 13, 2015 at 5:44 AM, Chester At Work ches...@alpinenow.com wrote: 1) client should not be private ( unless alternative is provided) so we can call it directly.

Re: Change for submitting to yarn in 1.3.1

2015-05-13 Thread Patrick Wendell
Hey Chester, Thanks for sending this. It's very helpful to have this list. The reason we made the Client API private was that it was never intended to be used by third parties programmatically and we don't intend to support it in its current form as a stable API. We thought the fact that it was

Re: Change for submitting to yarn in 1.3.1

2015-05-13 Thread Chester @work
Patrick Thanks for responding. Yes. many of are features requests not private client related. These are the things I have been working with since last year. I have trying to push the PR for these changes. If the new Launcher lib is the way to go , we will try to work with new APIs.

Re: Change for submitting to yarn in 1.3.1

2015-05-12 Thread Marcelo Vanzin
On Tue, May 12, 2015 at 11:34 AM, Kevin Markey kevin.mar...@oracle.com wrote: I understand that SparkLauncher was supposed to address these issues, but it really doesn't. Yarn already provides indirection and an arm's length transaction for starting Spark on a cluster. The launcher introduces

Re: Change for submitting to yarn in 1.3.1

2015-05-11 Thread Mridul Muralidharan
That works when it is launched from same process - which is unfortunately not our case :-) - Mridul On Sun, May 10, 2015 at 9:05 PM, Manku Timma manku.tim...@gmail.com wrote: sc.applicationId gives the yarn appid. On 11 May 2015 at 08:13, Mridul Muralidharan mri...@gmail.com wrote: We had a

Re: Change for submitting to yarn in 1.3.1

2015-05-10 Thread Manku Timma
sc.applicationId gives the yarn appid. On 11 May 2015 at 08:13, Mridul Muralidharan mri...@gmail.com wrote: We had a similar requirement, and as a stopgap, I currently use a suboptimal impl specific workaround - parsing it out of the stdout/stderr (based on log config). A better means to get

Re: Change for submitting to yarn in 1.3.1

2015-05-10 Thread Mridul Muralidharan
We had a similar requirement, and as a stopgap, I currently use a suboptimal impl specific workaround - parsing it out of the stdout/stderr (based on log config). A better means to get to this is indeed required ! Regards, Mridul On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo!

Change for submitting to yarn in 1.3.1

2015-05-10 Thread Ron's Yahoo!
Hi, I used to submit my Spark yarn applications by using org.apache.spark.yarn.deploy.Client api so I can get the application id after I submit it. The following is the code that I have, but after upgrading to 1.3.1, the yarn Client class was made into a private class. Is there a particular