Roy - can you check if you have HADOOP_CONF_DIR and YARN_CONF_DIR set to the 
directory containing the HDFS and YARN configuration files?

From: Sandeep Nemuri <nhsande...@gmail.com>
Date: Monday, March 27, 2017 at 9:44 AM
To: Saisai Shao <sai.sai.s...@gmail.com>
Cc: Yong Zhang <java8...@hotmail.com>, ", Roy" <rp...@njit.edu>, user 
<user@spark.apache.org>
Subject: Re: spark-submit config via file

You should try adding your NN host and port in the URL.

On Mon, Mar 27, 2017 at 11:03 AM, Saisai Shao 
<sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> wrote:
It's quite obvious your hdfs URL is not complete, please looks at the 
exception, your hdfs URI doesn't have host, port. Normally it should be OK if 
HDFS is your default FS.

I think the problem is you're running on HDI, in which default FS is wasb. So 
here short name without host:port will lead to error. This looks like a HDI 
specific issue, you'd better ask HDI.


Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no host: 
hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz

        at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)

        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2791)

        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)

        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2825)

        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)

        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)





On Fri, Mar 24, 2017 at 9:18 PM, Yong Zhang 
<java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote:

Of course it is possible.



You can always to set any configurations in your application using API, instead 
of pass in through the CLI.



val sparkConf = new 
SparkConf().setAppName(properties.get("appName")).set("master", 
properties.get("master")).set(xxx, properties.get("xxx"))

Your error is your environment problem.

Yong
________________________________
From: , Roy <rp...@njit.edu<mailto:rp...@njit.edu>>
Sent: Friday, March 24, 2017 7:38 AM
To: user
Subject: spark-submit config via file

Hi,

I am trying to deploy spark job by using spark-submit which has bunch of 
parameters like

spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode 
cluster --executor-memory 3072m --executor-cores 4 --files streaming.conf 
spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"

I was looking a way to put all these flags in the file to pass to spark-submit 
to make my spark-submitcommand simple like this

spark-submit --class StreamingEventWriterDriver --master yarn --deploy-mode 
cluster --properties-file properties.conf --files streaming.conf 
spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf "streaming.conf"

properties.conf has following contents



spark.executor.memory 3072m

spark.executor.cores 4



But I am getting following error



17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive for 
HDP, 
hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz

17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling threads for 
Delete operation as thread count 0 is <= 1

17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for Delete 
operation is: 1 ms with threads: 0

17/03/24 11:36:27 INFO Client: Deleted staging directory 
wasb://a...@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_1488402758319_0492<http://a...@abc.blob.core.windows.net/user/sshuser/.sparkStaging/application_1488402758319_0492>

Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no host: 
hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz

        at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)

        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2791)

        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)

        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2825)

        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)

        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)

        at 
org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:364)

        at 
org.apache.spark.deploy.yarn.Client.org<http://org.apache.spark.deploy.yarn.Client.org>$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:480)

        at 
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:552)

        at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:881)

        at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170)

        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218)

        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277)

        at org.apache.spark.deploy.yarn.Client.main(Client.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)

        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)

        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

17/03/24 11:36:27 INFO MetricsSystemImpl: Stopping azure-file-system metrics 
system...

Anyone know is this is even possible ?



Thanks...

Roy




--
  Regards
  Sandeep Nemuri

Reply via email to