[ 
https://issues.apache.org/jira/browse/SPARK-23123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328651#comment-16328651
 ] 

Steve Loughran commented on SPARK-23123:
----------------------------------------

I've never looked at ViewFS internals before, so treat my commentary here with 
caution
 # Something (probably yarn Node Manager/Resource Localizer) is trying to D/L 
the JAR from a viewfs URL
 # it can't init viewfs as it's not finding the conf entry for the mount table, 
which, is *probably* {{fs.viewfs.mounttable.default}}. (ie. it will be that 
unless overridden
 # Yet the spark-submit client can see it, which is why it manages to delete 
the staging dir.
 # Which would imply that the NM isn't getting the core-site.xml values 
configuring viewfs.

Like Saisai says, I wouldn't blame Spark here; I don't think it's a spark 
process

I'd try and work out which node this failed on and see what the NM logs say. If 
it's failing for this job submit, it's likely to be failing for other things 
too. Then try restarting it to see if the problem "goes away"...it would 
indicate that the settings in its local /etc/conf/hadoop/core-site.xml don't 
have the binding info.

If you are confident that it is in that file, and you've restarted the NM, and 
it's still failing in its logs, then file a YARN bug attaching the log. But 
you'd need to provide that evidence that it wasn't a local config problem 
before anyone would look at it

> Unable to run Spark Job with Hadoop NameNode Federation using ViewFS
> --------------------------------------------------------------------
>
>                 Key: SPARK-23123
>                 URL: https://issues.apache.org/jira/browse/SPARK-23123
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.6.3
>            Reporter: Nihar Nayak
>            Priority: Major
>              Labels: Hadoop, Spark
>
> Added following to core-site.xml in order to make use of ViewFS in a NameNode 
> federated cluster. 
> {noformat}
> <property>
>  <name>fs.defaultFS</name>
>  <value>viewfs:///</value>
>  </property>
> <property>
>  <name>fs.viewfs.mounttable.default.link./apps</name>
>  <value>hdfs://nameservice1/apps</value>
>  </property>
>  <property>
>  <name>fs.viewfs.mounttable.default.link./app-logs</name>
>  <value>hdfs://nameservice2/app-logs</value>
>  </property>
>  <property>
>  <name>fs.viewfs.mounttable.default.link./tmp</name>
>  <value>hdfs://nameservice2/tmp</value>
>  </property>
>  <property>
>  <name>fs.viewfs.mounttable.default.link./user</name>
>  <value>hdfs://nameservice2/user</value>
>  </property>
>  <property>
>  <name>fs.viewfs.mounttable.default.link./ns1/user</name>
>  <value>hdfs://nameservice1/user</value>
>  </property>
>  <property>
>  <name>fs.viewfs.mounttable.default.link./ns2/user</name>
>  <value>hdfs://nameservice2/user</value>
>  </property>
> {noformat}
> Got the following error .
> {noformat}
> spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client 
> --num-executors 3 --driver-memory 512m --executor-memory 512m 
> --executor-cores 1 ${SPARK_HOME}/lib/spark-examples*.jar 10
> 18/01/17 02:14:45 INFO spark.SparkContext: Added JAR 
> file:/home/nayakxxxx/hdp26_c4000_stg/spark2/lib/spark-examples_2.11-2.1.1.2.6.2.0-205.jar
>  at spark://xxxxx:35633/jars/spark-examples_2.11-2.1.1.2.6.2.0-205.jar with 
> timestamp 1516155285534
> 18/01/17 02:14:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> 18/01/17 02:14:46 INFO yarn.Client: Requesting a new application from cluster 
> with 26 NodeManagers
> 18/01/17 02:14:46 INFO yarn.Client: Verifying our application has not 
> requested more than the maximum memory capability of the cluster (13800 MB 
> per container)
> 18/01/17 02:14:46 INFO yarn.Client: Will allocate AM container, with 896 MB 
> memory including 384 MB overhead
> 18/01/17 02:14:46 INFO yarn.Client: Setting up container launch context for 
> our AM
> 18/01/17 02:14:46 INFO yarn.Client: Setting up the launch environment for our 
> AM container
> 18/01/17 02:14:46 INFO yarn.Client: Preparing resources for our AM container
> 18/01/17 02:14:46 INFO security.HDFSCredentialProvider: getting token for 
> namenode: viewfs:/user/nayakxxxx
> 18/01/17 02:14:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 
> 22488202 for nayakxxxx on ha-hdfs:nameservice1
> 18/01/17 02:14:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 50 
> for nayakxxxx on ha-hdfs:nameservice2
> 18/01/17 02:14:47 INFO hive.metastore: Trying to connect to metastore with 
> URI thrift://XXXX:9083
> 18/01/17 02:14:47 INFO hive.metastore: Connected to metastore.
> 18/01/17 02:14:49 INFO security.HiveCredentialProvider: Get Token from hive 
> metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 29 6e 61 79 61 
> 6b 6e 69 68 61 72 72 61 30 31 40 53 54 47 32 30 30 30 2e 48 41 44 4f 4f 50 2e 
> 52 41 4b 55 54 45 4e 2e 43 4f 4d 04 68 69 76 65 00 8a 01 61 01 e5 be 03 8a 01 
> 61 25 f2 42 03 8d 02 21 bb 8e 02 b7
> 18/01/17 02:14:49 WARN yarn.Client: Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 18/01/17 02:14:50 INFO yarn.Client: Uploading resource 
> file:/tmp/spark-7498ee81-d22b-426e-9466-3a08f7c827b1/__spark_libs__6643608006679813597.zip
>  -> 
> viewfs:/user/nayakxxxx/.sparkStaging/application_1515035441414_275503/__spark_libs__6643608006679813597.zip
> 18/01/17 02:14:55 INFO yarn.Client: Uploading resource 
> file:/tmp/spark-7498ee81-d22b-426e-9466-3a08f7c827b1/__spark_conf__405432153902988742.zip
>  -> 
> viewfs:/user/nayakxxxx/.sparkStaging/application_1515035441414_275503/__spark_conf__.zip
> 18/01/17 02:14:55 INFO spark.SecurityManager: Changing view acls to: nayakxxxx
> 18/01/17 02:14:55 INFO spark.SecurityManager: Changing modify acls to: 
> nayakxxxx
> 18/01/17 02:14:55 INFO spark.SecurityManager: Changing view acls groups to:
> 18/01/17 02:14:55 INFO spark.SecurityManager: Changing modify acls groups to:
> 18/01/17 02:14:55 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(nayakxxxx); 
> groups with view permissions: Set(); users  with modify permissions: 
> Set(nayakxxxx); groups with modify permissions: Set()
> 18/01/17 02:14:55 INFO yarn.Client: Submitting application 
> application_1515035441414_275503 to ResourceManager
> 18/01/17 02:14:55 INFO impl.YarnClientImpl: Submitted application 
> application_1515035441414_275503
> 18/01/17 02:14:55 INFO cluster.SchedulerExtensionServices: Starting Yarn 
> extension services with app application_1515035441414_275503 and attemptId 
> None
> 18/01/17 02:14:56 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:14:56 INFO yarn.Client:
>          client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
>          diagnostics: AM container is launched, waiting for AM container to 
> Register with RM
>          ApplicationMaster host: N/A
>          ApplicationMaster RPC port: -1
>          queue: common
>          start time: 1516155295100
>          final status: UNDEFINED
>          tracking URL: 
> http://XXXX:8088/proxy/application_1515035441414_275503/
>          user: nayakxxxx
> 18/01/17 02:14:57 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:14:58 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:14:59 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:00 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:01 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:02 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:03 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:04 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:05 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:06 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:07 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:08 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:09 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:10 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:11 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:12 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:13 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:14 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:15 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:16 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:17 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:18 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:19 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:20 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:21 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:22 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:23 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:24 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:25 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:26 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:27 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:28 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:29 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:30 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:31 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:32 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:33 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: ACCEPTED)
> 18/01/17 02:15:34 INFO yarn.Client: Application report for 
> application_1515035441414_275503 (state: FAILED)
> 18/01/17 02:15:34 INFO yarn.Client:
>          client token: N/A
>          diagnostics: Application application_1515035441414_275503 failed 10 
> times due to AM Container for appattempt_1515035441414_275503_000010 exited 
> with  exitCode: -1000
> For more detailed output, check the application tracking page: 
> http://xxxxx:8088/cluster/app/application_1515035441414_275503 Then click on 
> links to logs of each attempt.
> Diagnostics: ViewFs: Cannot initialize: Empty Mount table in config for 
> viewfs://default/
> java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config 
> for viewfs://default/
>         at org.apache.hadoop.fs.viewfs.InodeTree.<init>(InodeTree.java:337)
>         at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.<init>(ViewFileSystem.java:168)
>         at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:168)
>         at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
>         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
>         at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
>         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
>         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>         at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
>         at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>         at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>         at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Failing this attempt. Failing the application.
>          ApplicationMaster host: N/A
>          ApplicationMaster RPC port: -1
>          queue: common
>          start time: 1516155295100
>          final status: FAILED
>          tracking URL: 
> http://xxxxxx:8088/cluster/app/application_1515035441414_275503
>          user: nayakxxxx
> 18/01/17 02:15:34 INFO yarn.Client: Deleted staging directory 
> viewfs:/user/nayakxxxx/.sparkStaging/application_1515035441414_275503
> 18/01/17 02:15:34 ERROR spark.SparkContext: Error initializing SparkContext.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to