Re: How to access lost executor log file

2015-10-01 Thread Lan Jiang
Ted,

Thanks for your reply.

First of all, after sending email to the mailing list,  I use yarn logs
applicationId  to retrieve the aggregated log
successfully.  I found the exceptions I am looking for.

Now as to your suggestion, when I go to the YARN RM UI, I can only see the
"Tracking URL" in the application overview section. When I click it, it
brings me to the spark history server UI, where I cannot find the lost
exectuors. The only logs link I can find one the YARN RM site is the
ApplicationMaster log, which is not what I need. Did I miss something?

Lan

On Thu, Oct 1, 2015 at 1:30 PM, Ted Yu  wrote:

> Can you go to YARN RM UI to find all the attempts for this Spark Job ?
>
> The two lost executors should be found there.
>
> On Thu, Oct 1, 2015 at 10:30 AM, Lan Jiang  wrote:
>
>> Hi, there
>>
>> When running a Spark job on YARN, 2 executors somehow got lost during the
>> execution. The message on the history server GUI is “CANNOT find address”.
>> Two extra executors were launched by YARN and eventually finished the job.
>> Usually I go to the “Executors” tab on the UI to check the executor
>> stdout/stderr for troubleshoot. Now if I go to the “Executors” tab,  I do
>> not see the 2 executors that were lost. I can only see the rest executors
>> and the 2 new executors. Thus I cannot check the stdout/stderr of the lost
>> executors. How can I access the log files of these lost executors to find
>> out why they were lost?
>>
>> Thanks
>>
>> Lan
>>
>>
>>
>>
>>
>>
>


Re: How to access lost executor log file

2015-10-01 Thread Ted Yu
Can you go to YARN RM UI to find all the attempts for this Spark Job ?

The two lost executors should be found there.

On Thu, Oct 1, 2015 at 10:30 AM, Lan Jiang  wrote:

> Hi, there
>
> When running a Spark job on YARN, 2 executors somehow got lost during the
> execution. The message on the history server GUI is “CANNOT find address”.
> Two extra executors were launched by YARN and eventually finished the job.
> Usually I go to the “Executors” tab on the UI to check the executor
> stdout/stderr for troubleshoot. Now if I go to the “Executors” tab,  I do
> not see the 2 executors that were lost. I can only see the rest executors
> and the 2 new executors. Thus I cannot check the stdout/stderr of the lost
> executors. How can I access the log files of these lost executors to find
> out why they were lost?
>
> Thanks
>
> Lan
>
>
>
>
>
>


Re: How to access lost executor log file

2015-10-01 Thread Ted Yu
Looks like the spark history server should take the lost exectuors into
account by analyzing the output from 'yarn logs applicationId' command.

Cheers

On Thu, Oct 1, 2015 at 11:46 AM, Lan Jiang  wrote:

> Ted,
>
> Thanks for your reply.
>
> First of all, after sending email to the mailing list,  I use yarn logs
> applicationId  to retrieve the aggregated log
> successfully.  I found the exceptions I am looking for.
>
> Now as to your suggestion, when I go to the YARN RM UI, I can only see the
> "Tracking URL" in the application overview section. When I click it, it
> brings me to the spark history server UI, where I cannot find the lost
> exectuors. The only logs link I can find one the YARN RM site is the
> ApplicationMaster log, which is not what I need. Did I miss something?
>
> Lan
>
> On Thu, Oct 1, 2015 at 1:30 PM, Ted Yu  wrote:
>
>> Can you go to YARN RM UI to find all the attempts for this Spark Job ?
>>
>> The two lost executors should be found there.
>>
>> On Thu, Oct 1, 2015 at 10:30 AM, Lan Jiang  wrote:
>>
>>> Hi, there
>>>
>>> When running a Spark job on YARN, 2 executors somehow got lost during
>>> the execution. The message on the history server GUI is “CANNOT find
>>> address”.  Two extra executors were launched by YARN and eventually
>>> finished the job. Usually I go to the “Executors” tab on the UI to check
>>> the executor stdout/stderr for troubleshoot. Now if I go to the “Executors”
>>> tab,  I do not see the 2 executors that were lost. I can only see the rest
>>> executors and the 2 new executors. Thus I cannot check the stdout/stderr of
>>> the lost executors. How can I access the log files of these lost executors
>>> to find out why they were lost?
>>>
>>> Thanks
>>>
>>> Lan
>>>
>>>
>>>
>>>
>>>
>>>
>>
>