> > Hello Ted, >
Thanks for the response. Here is the additional information. > I am using spark 1.6.1 (spark-1.6.1-bin-hadoop2.6) > > > > Here is the code snippet > > > > > > JavaRDD<File> add = jsc.parallelize(listFolders, listFolders.size()); > > JavaRDD<Integer> test = add.map(new Function<File, Integer>() { > > @Override > > public Integer call(File file) throws Exception { > > String folder = file.getName(); > > System.out.println("[x] Processing dataset from the > directory " + folder); > > int status = 0; > > argsList[3] = argsList[3] + "/"+ folder; // full path > of the input folder. Input folder is in shared file system that every > worker node has access to it. Something like (“/home/user/software/data/”) > and folder name will be like (“20161307”) > > argsList[7] = argsList[7] + "/" + folder + ".csv"; // > full path of the output. > > try{ > > Launcher.main(argsList); // Launcher class is a > black box. It process the input folder and create a csv file which in the > output location (argsList[7]). This is also in a shared file system > > status = 0; > > } > > catch(Exception e) > > { > > System.out.println("[x] Execution of import tool > for the directory " + folder + "failed"); > > status = 0; > > } > > accum.add(1); > > return status; > > } > > }); > > > > > > Here is the spark-env.sh > > > > export SPARK_WORKER_INSTANCES=1 > > export JAVA_HOME=/home/work_IW1/opt/jdk1.8.0_77/ > > export HADOOP_CONF_DIR=/home/work_IW1/opt/hadoop-2.7.2/etc/hadoop > > > > Here is the spark-defaults.conf > > > > > > spark.master spark:// master:7077 > > spark.eventLog.enabled true > > spark.eventLog.dir hdfs://master:9000/sparkEvent > > spark.serializer > org.apache.spark.serializer.KryoSerializer > > spark.driver.memory 4g > > > Hope it helps.