Assigning input files to spark partitions

Pala M Muthaia Wed, 12 Nov 2014 19:37:00 -0800

Hi,

I have a set of input files for a spark program, with each file
corresponding to a logical data partition. What is the API/mechanism to
assign each input file (or a set of files) to a spark partition, when
initializing RDDs?


When i create a spark RDD pointing to the directory of files, my
understanding is it's not guaranteed that each input file will be treated
as separate partition.

My job semantics require that the data is partitioned, and i want to
leverage the partitioning that has already been done, rather than
repartitioning again in the spark job.

I tried to lookup online but haven't found any pointers so far.


Thanks
pala

Assigning input files to spark partitions

Reply via email to