to load the files and continue
>> processing in parallel, then a simple .map should work.
>> If you want to execute arbitrary code based on the list of files that
>> each executor received, then you need to use .foreach that will get
>> executed for each of the entries, on the worker
>>>
>>> On Wed, Oct 28, 2015 at 8:29 PM Adrian Tanase <atan...@adobe.com> wrote:
>>>
>>>> The first line is distributing your fileList variable in the cluster as
>>>> a RDD, partitioned using the default partitioner settings (e.g. Number of
>&
ores in your cluster).
>>>
>>> Each of your workers would one or more slices of data (depending on how
>>> many cores each executor has) and the abstraction is called partition.
>>>
>>> What is your use case? If you want to load the files and conti
use .foreach that will get executed for
> each of the entries, on the worker.
>
> -adrian
>
> From: Vinoth Sankar
> Date: Wednesday, October 28, 2015 at 2:49 PM
> To: "user@spark.apache.org"
> Subject: How do I parallize Spark Jobs at Executor Level.
>
> Hi,
Hi,
I'm reading and filtering large no of files using Spark. It's getting
parallized at Spark Driver level only. How do i make it parallelize to
Executor(Worker) Level. Refer the following sample. Is there any way to
paralleling iterate the localIterator ?
Note : I use Java 1.7 version
JavaRDD
of the entries, on the worker.
-adrian
From: Vinoth Sankar
Date: Wednesday, October 28, 2015 at 2:49 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: How do I parallize Spark Jobs at Executor Level.
Hi,
I'm reading and filtering large no of files using Spa