Re: Issue running Spark 1.4 on Yarn

2015-06-23 Thread Matt Kapilevich
Hi Kevin I never did. I checked for free space in the root partition, don't think this was an issue. Now that 1.4 is officially out I'll probably give it another shot. On Jun 22, 2015 4:28 PM, Kevin Markey kevin.mar...@oracle.com wrote: Matt: Did you ever resolve this issue? When running on a

Re: Issue running Spark 1.4 on Yarn

2015-06-11 Thread matvey14
No, this just a random queue name I picked when submitting the job, there's no specific configuration for it. I am not logged in, so don't have the default fair scheduler configuration in front of me, but I don't think that's the problem. The cluster is completely idle, there aren't any jobs being

Re: Issue running Spark 1.4 on Yarn

2015-06-11 Thread nsalian
Hello, Since the other queues are fine, I reckon, there may be a limit in the max apps or memory on this queue in particular. I don't suspect fairscheduler limits either but on this queue we may be seeing / hitting a maximum. Could you try to get the configs for the queue? That should provide

Re: Issue running Spark 1.4 on Yarn

2015-06-10 Thread matvey14
Hi nsalian, For some reason the rest of this thread isn't showing up here. The NodeManager isn't busy. I'll copy/paste, the details are in there. I've tried running a Hadoop app pointing to the same queue. Same

Re: Issue running Spark 1.4 on Yarn

2015-06-10 Thread nsalian
Hi, Thanks for the added information. Helps add more context. Is that specific queue different from the others? FairScheduler.xml should have the information needed.Or if you have a separate allocations.xml. Something of this format: allocations queue name=sample_queue minResources1

Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
Hi all, I'm manually building Spark from source against 1.4 branch and submitting the job against Yarn. I am seeing very strange behavior. The first 2 or 3 times I submit the job, it runs fine, computes Pi, and exits. The next time I run it, it gets stuck in the ACCEPTED state. I'm kicking off a

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
If your application is stuck in that state, it generally means your cluster doesn't have enough resources to start it. In the RM logs you can see how many vcores / memory the application is asking for, and then you can check your RM configuration to see if that's currently available on any single

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
Hi Marcelo, Thanks. I think something more subtle is happening. I'm running a single-node cluster, so there's only 1 NM. When I executed the exact same job the 4th time, the cluster was idle, and there was nothing else being executed. RM currently reports that I have 6.5GB of memory and 4 cpus

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
Yes! If I either specify a different queue or don't specify a queue at all, it works. On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin van...@cloudera.com wrote: Does it work if you don't specify a queue? On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com wrote: Hi Marcelo,

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
From the RM scheduler, I see 3 applications currently stuck in the root.thequeue queue. Used Resources: memory:0, vCores:0 Num Active Applications: 0 Num Pending Applications: 3 Min Resources: memory:0, vCores:0 Max Resources: memory:6655, vCores:4 Steady Fair Share: memory:1664, vCores:0

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
Apologies, I see you already posted everything from the RM logs that mention your stuck app. Have you tried restarting the YARN cluster to see if that changes anything? Does it go back to the first few tries work behaviour? I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything like

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
Does it work if you don't specify a queue? On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com wrote: Hi Marcelo, Yes, restarting YARN fixes this behavior and it again works the first few times. The only thing that's consistent is that once Spark job submissions stop working,

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
Hi Marcelo, Yes, restarting YARN fixes this behavior and it again works the first few times. The only thing that's consistent is that once Spark job submissions stop working, it's broken for good. On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com wrote: Apologies, I see you

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com wrote: Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now - this problem is specific to Spark. That doesn't necessarily mean anything. Spark apps have different resource requirements than Hadoop apps.

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread nsalian
I see the other jobs SUCCEEDED without issues. Could you snapshot the FairScheduler activity as well? My guess it, with the single core, it is reaching a NodeManager that is still busy with other jobs and the job ends up in a waiting state. Does the job eventually complete? Could you

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
I've tried running a Hadoop app pointing to the same queue. Same thing now, the job doesn't get accepted. I've cleared out the queue and killed all the pending jobs, the queue is still unusable. It seems like an issue with YARN, but it's specifically Spark that leaves the queue in this state.