[ https://issues.apache.org/jira/browse/SPARK-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050270#comment-15050270 ]
Sebastian YEPES FERNANDEZ commented on SPARK-12239: --------------------------------------------------- [~sunrui] Thanks for the workaround, it works! Actually our real use case is to use SparkR through RStudio Server, I just used R to simplify the reproduction of the problem. > SparkR - Not distributing SparkR module in YARN > ------------------------------------------------ > > Key: SPARK-12239 > URL: https://issues.apache.org/jira/browse/SPARK-12239 > Project: Spark > Issue Type: Bug > Components: SparkR, YARN > Affects Versions: 1.5.2, 1.5.3 > Reporter: Sebastian YEPES FERNANDEZ > Priority: Critical > > Hello, > I am trying to use the SparkR in a YARN environment and I have encountered > the following problem: > Every thing work correctly when using bin/sparkR, but if I try running the > same jobs using sparkR directly through R it does not work. > I have managed to track down what is causing the problem, when sparkR is > launched through R the "SparkR" module is not distributed to the worker nodes. > I have tried working around this issue using the setting > "spark.yarn.dist.archives", but it does not work as it deploys the > file/extracted folder with the extension ".zip" and workers are actually > looking for a folder with the name "sparkr" > Is there currently any way to make this work? > {code} > # spark-defaults.conf > spark.yarn.dist.archives /opt/apps/spark/R/lib/sparkr.zip > # R > library(SparkR, lib.loc="/opt/apps/spark/R/lib/") > sc <- sparkR.init(appName="SparkR", master="yarn-client", > sparkEnvir=list(spark.executor.instances="1")) > sqlContext <- sparkRSQL.init(sc) > df <- createDataFrame(sqlContext, faithful) > head(df) > 15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, > fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > {code} > Container stderr: > {code} > 15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as > values in memory (estimated size 8.7 KB, free 530.0 MB) > 15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file > '/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_000002/sparkr/SparkR/worker/daemon.R': > No such file or directory > 15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > at java.net.ServerSocket.implAccept(ServerSocket.java:545) > at java.net.ServerSocket.accept(ServerSocket.java:513) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426) > {code} > Worker Node that runned the Container: > {code} > # ls -la > /hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_000002 > total 71M > drwx--x--- 3 yarn hadoop 4.0K Dec 9 09:04 . > drwx--x--- 7 yarn hadoop 4.0K Dec 9 09:04 .. > -rw-r--r-- 1 yarn hadoop 110 Dec 9 09:03 container_tokens > -rw-r--r-- 1 yarn hadoop 12 Dec 9 09:03 .container_tokens.crc > -rwx------ 1 yarn hadoop 736 Dec 9 09:03 > default_container_executor_session.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 > .default_container_executor_session.sh.crc > -rwx------ 1 yarn hadoop 790 Dec 9 09:03 default_container_executor.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 .default_container_executor.sh.crc > -rwxr-xr-x 1 yarn hadoop 61K Dec 9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar > -rwxr-xr-x 1 yarn hadoop 317K Dec 9 09:04 kafka-clients-0.8.2.2.jar > -rwx------ 1 yarn hadoop 6.0K Dec 9 09:03 launch_container.sh > -rw-r--r-- 1 yarn hadoop 56 Dec 9 09:03 .launch_container.sh.crc > -rwxr-xr-x 1 yarn hadoop 2.2M Dec 9 09:04 > spark-cassandra-connector_2.10-1.5.0-M3.jar > -rwxr-xr-x 1 yarn hadoop 7.1M Dec 9 09:04 spark-csv-assembly-1.3.0.jar > lrwxrwxrwx 1 yarn hadoop 119 Dec 9 09:03 __spark__.jar -> > /hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar > lrwxrwxrwx 1 yarn hadoop 84 Dec 9 09:03 sparkr.zip -> > /hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip > -rwxr-xr-x 1 yarn hadoop 1.8M Dec 9 09:04 > spark-streaming_2.10-1.5.3-SNAPSHOT.jar > -rwxr-xr-x 1 yarn hadoop 11M Dec 9 09:04 > spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar > -rwxr-xr-x 1 yarn hadoop 48M Dec 9 09:04 > sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar > drwx--x--- 2 yarn hadoop 46 Dec 9 09:04 tmp > {code} > *Working case:* > {code} > # sparkR --master yarn-client --num-executors 1 > df <- createDataFrame(sqlContext, faithful) > head(df) > eruptions waiting > 1 3.600 79 > 2 1.800 54 > 3 3.333 74 > 4 2.283 62 > 5 4.533 85 > 6 2.883 55 > {code} > Worker Node that runned the Container: > {code} > # ls -la > /hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1170/container_e44_1445706872927_1170_01_000002/ > total 71M > drwx--x--- 3 yarn hadoop 4.0K Dec 9 09:14 . > drwx--x--- 6 yarn hadoop 4.0K Dec 9 09:14 .. > -rw-r--r-- 1 yarn hadoop 110 Dec 9 09:14 container_tokens > -rw-r--r-- 1 yarn hadoop 12 Dec 9 09:14 .container_tokens.crc > -rwx------ 1 yarn hadoop 736 Dec 9 09:14 > default_container_executor_session.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:14 > .default_container_executor_session.sh.crc > -rwx------ 1 yarn hadoop 790 Dec 9 09:14 default_container_executor.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:14 .default_container_executor.sh.crc > -rwxr-xr-x 1 yarn hadoop 61K Dec 9 09:14 hadoop-lzo-0.6.0.2.3.2.0-2950.jar > -rwxr-xr-x 1 yarn hadoop 317K Dec 9 09:14 kafka-clients-0.8.2.2.jar > -rwx------ 1 yarn hadoop 6.3K Dec 9 09:14 launch_container.sh > -rw-r--r-- 1 yarn hadoop 60 Dec 9 09:14 .launch_container.sh.crc > -rwxr-xr-x 1 yarn hadoop 2.2M Dec 9 09:14 > spark-cassandra-connector_2.10-1.5.0-M3.jar > -rwxr-xr-x 1 yarn hadoop 7.1M Dec 9 09:14 spark-csv-assembly-1.3.0.jar > lrwxrwxrwx 1 yarn hadoop 119 Dec 9 09:14 __spark__.jar -> > /hadoop/hdfs/disk05/hadoop/yarn/local/usercache/spark/filecache/368/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar > lrwxrwxrwx 1 yarn hadoop 84 Dec 9 09:14 sparkr -> > /hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/filecache/367/sparkr.zip > -rwxr-xr-x 1 yarn hadoop 1.8M Dec 9 09:14 > spark-streaming_2.10-1.5.3-SNAPSHOT.jar > -rwxr-xr-x 1 yarn hadoop 11M Dec 9 09:14 > spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar > -rwxr-xr-x 1 yarn hadoop 48M Dec 9 09:14 > sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar > drwx--x--- 2 yarn hadoop 46 Dec 9 09:14 tmp > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org