This is probably stupid user error, but I can't for the life of me figure
out how to access the files that are staged by the init-container.

I'm trying to run the SparkR example data-manipulation.R which requires the
path to its datafile. I supply the hdfs location via --files and then the
full hdfs path.

--files hdfs://
local:///opt/spark/examples/src/main/r/data-manipulation.R hdfs://

The init-container seems to load my file.

18/02/26 18:29:09 INFO spark.SparkContext: Added file hdfs:// at hdfs:// with timestamp 1519669749519

18/02/26 18:29:09 INFO util.Utils: Fetching hdfs:// to

However, I get an error that my file does not exist.

Error in file(file, "rt") : cannot open the connection

Calls: read.csv -> read.table -> file

In addition: Warning message:

In file(file, "rt") :

  cannot open file 'hdfs://': No
such file or directory

Execution halted

Exception in thread "main" org.apache.spark.SparkUserAppException: User
application exited with 1

at org.apache.spark.deploy.RRunner$.main(RRunner.scala:104)

at org.apache.spark.deploy.RRunner.main(RRunner.scala)

If I try supplying just flights.csv, I get a different error

--files hdfs://
local:///opt/spark/examples/src/main/r/data-manipulation.R flights.csv

Error: Error in loadDF : analysis error - Path does not exist: hdfs://;

Execution halted

Exception in thread "main" org.apache.spark.SparkUserAppException: User
application exited with 1

at org.apache.spark.deploy.RRunner$.main(RRunner.scala:104)

at org.apache.spark.deploy.RRunner.main(RRunner.scala)

If the path /user/root/flights.csv does exist and I only supply
"flights.csv" as the file path, it runs to completion successfully.
However, if I provide the file path as "hdfs://," I get the same "No such file or
directory" error as I do initially.

Since I obviously can't put all my hdfs files under /user/root, how do I
get it to use the file that the init-container is fetching?


Reply via email to