I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
(commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
log4j.properties is not getting picked up in the executor classpath (and
driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file
is taking precedence in the YARN containers.

Spark's log4j.properties file is correctly being bundled into the
__spark_conf__.zip file and getting added to the DistributedCache, but it
is not in the classpath of the executor, as evidenced by the following
command, which I ran in spark-shell:

scala> sc.parallelize(Seq(1)).map(_ =>
getClass().getResource("/log4j.properties")).first
res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties

I then ran the following in spark-shell to verify the classpath of the
executors:

scala> sc.parallelize(Seq(1)).map(_ =>
System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
!e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
...
/mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
/mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
/etc/hadoop/conf
...

So the JVM has this nonexistent __spark_conf__ directory in the classpath
when it should really be __spark_conf__.zip (which is actually a symlink to
a directory, despite the .zip filename).

% sudo ls -l
/mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
total 20
-rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
-rwx------ 1 yarn yarn  594 Jun 18 01:26
default_container_executor_session.sh
-rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
-rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
/mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
/mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp

Does anybody know why this is happening? Is this a bug in Spark, or is it
the JVM doing this (possibly because the extension is .zip)?

Thanks,
Jonathan

Reply via email to