The files specified through --files are localized by the init-container
to /var/spark-data/spark-files by default. So in your case, the file should
be located at /var/spark-data/spark-files/flights.csv locally in the
container.

On Mon, Feb 26, 2018 at 10:51 AM, Jenna Hoole <jenna.ho...@gmail.com> wrote:

> This is probably stupid user error, but I can't for the life of me figure
> out how to access the files that are staged by the init-container.
>
> I'm trying to run the SparkR example data-manipulation.R which requires
> the path to its datafile. I supply the hdfs location via --files and then
> the full hdfs path.
>
>
> --files hdfs://192.168.0.1:8020/user/jhoole/flights.csv
> local:///opt/spark/examples/src/main/r/data-manipulation.R hdfs://
> 192.168.0.1:8020/user/jhoole/flights.csv
>
> The init-container seems to load my file.
>
> 18/02/26 18:29:09 INFO spark.SparkContext: Added file hdfs://
> 192.168.0.1:8020/user/jhoole/flights.csv at hdfs://192.168.0.1:8020/user/
> jhoole/flights.csv with timestamp 1519669749519
>
> 18/02/26 18:29:09 INFO util.Utils: Fetching hdfs://192.168.0.1:8020/user/
> jhoole/flights.csv to /var/spark/tmp/spark-d943dae6-
> 9b95-4df0-87a3-9f7978d6d4d2/userFiles-4112b7aa-b9e7-47a9-
> bcbc-7f7a01f93e38/fetchFileTemp7872615076522023165.tmp
>
> However, I get an error that my file does not exist.
>
> Error in file(file, "rt") : cannot open the connection
>
> Calls: read.csv -> read.table -> file
>
> In addition: Warning message:
>
> In file(file, "rt") :
>
>   cannot open file 'hdfs://192.168.0.1:8020/user/jhoole/flights.csv': No
> such file or directory
>
> Execution halted
>
> Exception in thread "main" org.apache.spark.SparkUserAppException: User
> application exited with 1
>
> at org.apache.spark.deploy.RRunner$.main(RRunner.scala:104)
>
> at org.apache.spark.deploy.RRunner.main(RRunner.scala)
>
> If I try supplying just flights.csv, I get a different error
>
> --files hdfs://192.168.0.1:8020/user/jhoole/flights.csv
> local:///opt/spark/examples/src/main/r/data-manipulation.R flights.csv
>
> Error: Error in loadDF : analysis error - Path does not exist: hdfs://
> 192.168.0.1:8020/user/root/flights.csv;
>
> Execution halted
>
> Exception in thread "main" org.apache.spark.SparkUserAppException: User
> application exited with 1
>
> at org.apache.spark.deploy.RRunner$.main(RRunner.scala:104)
>
> at org.apache.spark.deploy.RRunner.main(RRunner.scala)
>
> If the path /user/root/flights.csv does exist and I only supply
> "flights.csv" as the file path, it runs to completion successfully.
> However, if I provide the file path as "hdfs://192.168.0.1:8020/user/
> root/flights.csv," I get the same "No such file or directory" error as I
> do initially.
>
> Since I obviously can't put all my hdfs files under /user/root, how do I
> get it to use the file that the init-container is fetching?
>
> Thanks,
> Jenna
>

Reply via email to