Re: Is it possible Running SparkR on 2 nodes without HDFS

Ali Tajeldin EDU Tue, 10 Nov 2015 15:16:49 -0800

make sure 
"/mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv" is 
accessible on your slave node.
--
Ali


On Nov 9, 2015, at 6:06 PM, Sanjay Subramanian 
<sanjaysubraman...@yahoo.com.INVALID> wrote:

> hey guys
> 
> I have a 2 node SparkR (1 master 1 slave)cluster on AWS using 
> spark-1.5.1-bin-without-hadoop.tgz
> 
> Running the SparkR job on the master node 
> 
> /opt/spark-1.5.1-bin-hadoop2.6/bin/sparkR --master  
> spark://ip-xx-ppp-vv-ddd:7077 --packages com.databricks:spark-csv_2.10:1.2.0  
> --executor-cores 16 --num-executors 8 --executor-memory 8G --driver-memory 8g 
>   myRprogram.R
> 
> 
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 
> in stage 1.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1.0 
> (TID 103, xx.ff.rr.tt): java.io.FileNotFoundException: File 
> file:/mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv
>  does not exist
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>       at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecord
> 
> 
> 
> 
> 
> myRprogram.R
> 
> library(SparkR)
> 
> sc <- sparkR.init(appName="SparkR-CancerData-example")
> sqlContext <- sparkRSQL.init(sc)
> 
> lds <- read.df(sqlContext, 
> "file:///mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv",
>  "com.databricks.spark.csv", header="true")
> sink("file:///mnt/local/1024gbxvdf1/leads_new_data_analyis.txt")
> summary(lds)
> 
> 
> This used to run when we had a single node SparkR installation
> 
> regards
> 
> sanjay
> 
>

Re: Is it possible Running SparkR on 2 nodes without HDFS

Reply via email to