>
> Hello Ted,
>

Thanks for the response. Here is the additional information.


> I am using spark 1.6.1  (spark-1.6.1-bin-hadoop2.6)
>
>
>
> Here is the code snippet
>
>
>
>
>
> JavaRDD<File> add = jsc.parallelize(listFolders, listFolders.size());
>
>             JavaRDD<Integer> test = add.map(new Function<File, Integer>() {
>
>                 @Override
>
>                 public Integer call(File file) throws Exception {
>
>                     String folder = file.getName();
>
>                     System.out.println("[x] Processing dataset from the
> directory " + folder);
>
>                     int status = 0;
>
>                    argsList[3] = argsList[3] + "/"+ folder;   // full path
> of the input folder. Input folder is in shared file system that every
> worker node has access to it. Something like (“/home/user/software/data/”)
> and folder name will be like (“20161307”)
>
>                     argsList[7] = argsList[7] + "/" + folder + ".csv"; //
> full path of the output.
>
>                     try{
>
>                         Launcher.main(argsList);  // Launcher class is a
> black box. It process the input folder and create a csv file which in the
> output location (argsList[7]). This is also in a shared file system
>
>                         status = 0;
>
>                     }
>
>                     catch(Exception e)
>
>                     {
>
>                         System.out.println("[x] Execution of import tool
> for the directory " + folder + "failed");
>
>                         status = 0;
>
>                     }
>
>                     accum.add(1);
>
>                     return status;
>
>                 }
>
>             });
>
>
>
>
>
> Here is the spark-env.sh
>
>
>
> export SPARK_WORKER_INSTANCES=1
>
> export JAVA_HOME=/home/work_IW1/opt/jdk1.8.0_77/
>
> export HADOOP_CONF_DIR=/home/work_IW1/opt/hadoop-2.7.2/etc/hadoop
>
>
>
> Here is the spark-defaults.conf
>
>
>
>
>
>   spark.master                     spark:// master:7077
>
>   spark.eventLog.enabled           true
>
>   spark.eventLog.dir               hdfs://master:9000/sparkEvent
>
>   spark.serializer
> org.apache.spark.serializer.KryoSerializer
>
>   spark.driver.memory              4g
>
>
>


Hope it helps.

Reply via email to