Hi Kevin I never did. I checked for free space in the root partition, don't think this was an issue. Now that 1.4 is officially out I'll probably give it another shot. On Jun 22, 2015 4:28 PM, "Kevin Markey" <kevin.mar...@oracle.com> wrote:
> Matt: Did you ever resolve this issue? When running on a cluster or > pseudocluster with too little space for /tmp or /var files, we've seen this > sort of behavior. There's enough memory, and enough HDFS space, but > there's insufficient space on one or more nodes for other temporary files > as logs grow and don't get cleared or deleted. Depends on your > configuration. Often restarting will temporarily fix things, but for > shorter and shorter periods of time until nothing works. > > Fix is to expand space available for logs, pruning them, a cron job to > prune them periodically, and/or modifying limits on logs. > > Kevin > > On 06/09/2015 04:15 PM, Matt Kapilevich wrote: > > I've tried running a Hadoop app pointing to the same queue. Same thing > now, the job doesn't get accepted. I've cleared out the queue and killed > all the pending jobs, the queue is still unusable. > > It seems like an issue with YARN, but it's specifically Spark that > leaves the queue in this state. I've ran a Hadoop job in a for loop 10x, > while specifying the queue explicitly, just to double-check. > > On Tue, Jun 9, 2015 at 4:45 PM, Matt Kapilevich <matve...@gmail.com> > wrote: > >> From the RM scheduler, I see 3 applications currently stuck in the >> "root.thequeue" queue. >> >> Used Resources: <memory:0, vCores:0> >> Num Active Applications: 0 >> Num Pending Applications: 3 >> Min Resources: <memory:0, vCores:0> >> Max Resources: <memory:6655, vCores:4> >> Steady Fair Share: <memory:1664, vCores:0> >> Instantaneous Fair Share: <memory:6655, vCores:0> >> >> On Tue, Jun 9, 2015 at 4:30 PM, Matt Kapilevich <matve...@gmail.com> >> wrote: >> >>> Yes! If I either specify a different queue or don't specify a queue at >>> all, it works. >>> >>> On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin <van...@cloudera.com> >>> wrote: >>> >>>> Does it work if you don't specify a queue? >>>> >>>> On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich <matve...@gmail.com> >>>> wrote: >>>> >>>>> Hi Marcelo, >>>>> >>>>> Yes, restarting YARN fixes this behavior and it again works the >>>>> first few times. The only thing that's consistent is that once Spark job >>>>> submissions stop working, it's broken for good. >>>>> >>>>> On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin <van...@cloudera.com> >>>>> wrote: >>>>> >>>>>> Apologies, I see you already posted everything from the RM logs >>>>>> that mention your stuck app. >>>>>> >>>>>> Have you tried restarting the YARN cluster to see if that changes >>>>>> anything? Does it go back to the "first few tries work" behaviour? >>>>>> >>>>>> I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything >>>>>> like this. >>>>>> >>>>>> >>>>>> On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin <van...@cloudera.com> >>>>>> wrote: >>>>>> >>>>>>> On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich < >>>>>>> matve...@gmail.com> wrote: >>>>>>> >>>>>>>> Like I mentioned earlier, I'm able to execute Hadoop jobs fine >>>>>>>> even now - this problem is specific to Spark. >>>>>>>> >>>>>>> >>>>>>> That doesn't necessarily mean anything. Spark apps have different >>>>>>> resource requirements than Hadoop apps. >>>>>>> >>>>>>> Check your RM logs for any line that mentions your Spark app id. >>>>>>> That may give you some insight into what's happening or not. >>>>>>> >>>>>>> -- >>>>>>> Marcelo >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Marcelo >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Marcelo >>>> >>> >>> >> > >