Where is argsList defined? is Launcher.main() thread-safe? Note that if multiple folders are processed in a node, multiple threads may concurrently run in the executor, each processing a folder.
> On Jul 14, 2016, at 12:28, Balachandar R.A. <balachandar...@gmail.com> wrote: > > Hello Ted, > > > Thanks for the response. Here is the additional information. > > I am using spark 1.6.1 (spark-1.6.1-bin-hadoop2.6) > > Here is the code snippet > > > JavaRDD<File> add = jsc.parallelize(listFolders, listFolders.size()); > JavaRDD<Integer> test = add.map(new Function<File, Integer>() { > @Override > public Integer call(File file) throws Exception { > String folder = file.getName(); > System.out.println("[x] Processing dataset from the > directory " + folder); > int status = 0; > argsList[3] = argsList[3] + "/"+ folder; // full path of > the input folder. Input folder is in shared file system that every worker > node has access to it. Something like (“/home/user/software/data/”) and > folder name will be like (“20161307”) > argsList[7] = argsList[7] + "/" + folder + ".csv"; // > full path of the output. > try{ > Launcher.main(argsList); // Launcher class is a > black box. It process the input folder and create a csv file which in the > output location (argsList[7]). This is also in a shared file system > status = 0; > } > catch(Exception e) > { > System.out.println("[x] Execution of import tool for > the directory " + folder + "failed"); > status = 0; > } > accum.add(1); > return status; > } > }); > > > Here is the spark-env.sh > > export SPARK_WORKER_INSTANCES=1 > export JAVA_HOME=/home/work_IW1/opt/jdk1.8.0_77/ > export HADOOP_CONF_DIR=/home/work_IW1/opt/hadoop-2.7.2/etc/hadoop > > Here is the spark-defaults.conf > > > spark.master spark:// master:7077 > spark.eventLog.enabled true > spark.eventLog.dir hdfs://master:9000/sparkEvent > spark.serializer org.apache.spark.serializer.KryoSerializer > spark.driver.memory 4g > > > > Hope it helps.