Where is argsList defined? is Launcher.main() thread-safe? Note that if 
multiple folders are processed in a node, multiple threads may concurrently run 
in the executor, each processing a folder.

> On Jul 14, 2016, at 12:28, Balachandar R.A. <balachandar...@gmail.com> wrote:
> 
> Hello Ted, 
> 
> 
> Thanks for the response. Here is the additional information.
>  
> I am using spark 1.6.1  (spark-1.6.1-bin-hadoop2.6)
>  
> Here is the code snippet
>  
>  
> JavaRDD<File> add = jsc.parallelize(listFolders, listFolders.size());
>             JavaRDD<Integer> test = add.map(new Function<File, Integer>() {
>                 @Override
>                 public Integer call(File file) throws Exception {
>                     String folder = file.getName();
>                     System.out.println("[x] Processing dataset from the 
> directory " + folder);
>                     int status = 0;
>                    argsList[3] = argsList[3] + "/"+ folder;   // full path of 
> the input folder. Input folder is in shared file system that every worker 
> node has access to it. Something like (“/home/user/software/data/”) and 
> folder name will be like (“20161307”)
>                     argsList[7] = argsList[7] + "/" + folder + ".csv"; // 
> full path of the output.
>                     try{
>                         Launcher.main(argsList);  // Launcher class is a 
> black box. It process the input folder and create a csv file which in the 
> output location (argsList[7]). This is also in a shared file system
>                         status = 0;
>                     }
>                     catch(Exception e)
>                     {
>                         System.out.println("[x] Execution of import tool for 
> the directory " + folder + "failed");
>                         status = 0;
>                     }
>                     accum.add(1);
>                     return status;
>                 }
>             });
>  
>  
> Here is the spark-env.sh
>  
> export SPARK_WORKER_INSTANCES=1
> export JAVA_HOME=/home/work_IW1/opt/jdk1.8.0_77/
> export HADOOP_CONF_DIR=/home/work_IW1/opt/hadoop-2.7.2/etc/hadoop
>  
> Here is the spark-defaults.conf
>  
>  
>   spark.master                     spark:// master:7077
>   spark.eventLog.enabled           true
>   spark.eventLog.dir               hdfs://master:9000/sparkEvent
>   spark.serializer                 org.apache.spark.serializer.KryoSerializer
>   spark.driver.memory              4g
>  
> 
> 
> Hope it helps. 

Reply via email to