Hi Jeff and Prabhu,

Thanks for your help.

I look deep in the nodemanager log and I found that I have a error message
like this:
2016-03-02 03:13:59,692 ERROR
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error
opening leveldb file
file:/data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb
<file:///data/yarn/cache/yarn/nm-local-dir/registeredExecutors.ldb>.
Creating new file, will not be able to recover state for existing
applications

This error message is also reported in the following jira ticket.
https://issues.apache.org/jira/browse/SPARK-13622

I reason for this problem is that in core-site.xml, I set hadoop.tmp.dir as
follows:
    <property>
         <name>hadoop.tmp.dir</name>
         <value>file:/home/xs6/hadoop-2.7.1/tmp</value>
    </property>

I solve the problem by remove "file:" from the value fields.

Thanks!

Xiaoye


On Wed, Mar 2, 2016 at 10:02 PM, Prabhu Joseph <prabhujose.ga...@gmail.com>
wrote:

> Is all NodeManager services restarted after the change in yarn-site.xml
>
> On Thu, Mar 3, 2016 at 6:00 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> The executor may fail to start. You need to check the executor logs, if
>> there's no executor log then you need to check node manager log.
>>
>> On Wed, Mar 2, 2016 at 4:26 PM, Xiaoye Sun <sunxiaoy...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am very new to spark and yarn.
>>>
>>> I am running a BroadcastTest example application using spark 1.6.0 and
>>> Hadoop/Yarn 2.7.1. in a 5 nodes cluster.
>>>
>>> I configured my configuration files according to
>>> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>>>
>>> 1. copy
>>> ./spark-1.6.0/network/yarn/target/scala-2.10/spark-1.6.0-yarn-shuffle.jar
>>> to /hadoop-2.7.1/share/hadoop/yarn/lib/
>>> 2. yarn-site.xml is like this
>>> http://www.owlnet.rice.edu/~xs6/yarn-site.xml
>>> 3. spark-defaults.conf is like this
>>> http://www.owlnet.rice.edu/~xs6/spark-defaults.conf
>>> 4. spark-env.sh is like this
>>> http://www.owlnet.rice.edu/~xs6/spark-env.sh
>>> 5. the command I use to submit spark application is: ./bin/spark-submit
>>> --class org.apache.spark.examples.BroadcastTest --master yarn --deploy-mode
>>> cluster ./examples/target/spark-examples_2.10-1.6.0.jar 1 10000000 Http
>>>
>>> However, the job is stuck at RUNNING status, and by looking at the log,
>>> I found that the executor is failed/cancelled frequently...
>>> Here is the log output http://www.owlnet.rice.edu/~xs6/stderr
>>> It shows something like
>>>
>>> 16/03/02 02:07:35 WARN yarn.YarnAllocator: Container marked as failed: 
>>> container_1456905762620_0002_01_000002 on host: bold-x.rice.edu. Exit 
>>> status: 1. Diagnostics: Exception from container-launch.
>>>
>>>
>>> Is there anybody know what is the problem here?
>>> Best,
>>> Xiaoye
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>

Reply via email to