Hi Kevin I never did. I checked for free space in the root partition, don't
think this was an issue. Now that 1.4 is officially out I'll probably give
it another shot.
On Jun 22, 2015 4:28 PM, "Kevin Markey" <kevin.mar...@oracle.com> wrote:

>  Matt:  Did you ever resolve this issue?  When running on a cluster or
> pseudocluster with too little space for /tmp or /var files, we've seen this
> sort of behavior.  There's enough memory, and enough HDFS space, but
> there's insufficient space on one or more nodes for other temporary files
> as logs grow and don't get cleared or deleted.  Depends on your
> configuration.  Often restarting will temporarily fix things, but for
> shorter and shorter periods of time until nothing works.
>
> Fix is to expand space available for logs, pruning them, a cron job to
> prune them periodically, and/or modifying limits on logs.
>
> Kevin
>
> On 06/09/2015 04:15 PM, Matt Kapilevich wrote:
>
> I've tried running a Hadoop app pointing to the same queue. Same thing
> now, the job doesn't get accepted. I've cleared out the queue and killed
> all the pending jobs, the queue is still unusable.
>
>  It seems like an issue with YARN, but it's specifically Spark that
> leaves the queue in this state. I've ran a Hadoop job in a for loop 10x,
> while specifying the queue explicitly, just to double-check.
>
> On Tue, Jun 9, 2015 at 4:45 PM, Matt Kapilevich <matve...@gmail.com>
> wrote:
>
>> From the RM scheduler, I see 3 applications currently stuck in the
>> "root.thequeue" queue.
>>
>>  Used Resources: <memory:0, vCores:0>
>> Num Active Applications: 0
>> Num Pending Applications: 3
>> Min Resources: <memory:0, vCores:0>
>> Max Resources: <memory:6655, vCores:4>
>> Steady Fair Share: <memory:1664, vCores:0>
>> Instantaneous Fair Share: <memory:6655, vCores:0>
>>
>> On Tue, Jun 9, 2015 at 4:30 PM, Matt Kapilevich <matve...@gmail.com>
>> wrote:
>>
>>> Yes! If I either specify a different queue or don't specify a queue at
>>> all, it works.
>>>
>>> On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin <van...@cloudera.com>
>>> wrote:
>>>
>>>> Does it work if you don't specify a queue?
>>>>
>>>> On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich <matve...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Marcelo,
>>>>>
>>>>>  Yes, restarting YARN fixes this behavior and it again works the
>>>>> first few times. The only thing that's consistent is that once Spark job
>>>>> submissions stop working, it's broken for good.
>>>>>
>>>>> On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin <van...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>>  Apologies, I see you already posted everything from the RM logs
>>>>>> that mention your stuck app.
>>>>>>
>>>>>>  Have you tried restarting the YARN cluster to see if that changes
>>>>>> anything? Does it go back to the "first few tries work" behaviour?
>>>>>>
>>>>>>  I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything
>>>>>> like this.
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin <van...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>>  On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich <
>>>>>>> matve...@gmail.com> wrote:
>>>>>>>
>>>>>>>>  Like I mentioned earlier, I'm able to execute Hadoop jobs fine
>>>>>>>> even now - this problem is specific to Spark.
>>>>>>>>
>>>>>>>
>>>>>>>  That doesn't necessarily mean anything. Spark apps have different
>>>>>>> resource requirements than Hadoop apps.
>>>>>>>
>>>>>>> Check your RM logs for any line that mentions your Spark app id.
>>>>>>> That may give you some insight into what's happening or not.
>>>>>>>
>>>>>>> --
>>>>>>> Marcelo
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Marcelo
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Marcelo
>>>>
>>>
>>>
>>
>
>

Reply via email to