If your file format is splittable say TSV, CSV etc, it will be distributed across all executors.
On Sat, Sep 3, 2016 at 3:38 PM, Somasundaram Sekar < somasundar.se...@tigeranalytics.com> wrote: > Hi All, > > > > Would like to gain some understanding on the questions listed below, > > > > 1. When processing a large file with Apache Spark, with, say, > sc.textFile("somefile.xml"), does it split it for parallel processing > across executors or, will it be processed as a single chunk in a single > executor? > > 2. When using dataframes, with implicit XMLContext from Databricks > is there any optimization prebuilt for such large file processing? > > > > Please help!!! > > > > http://stackoverflow.com/questions/39305310/does-spark- > process-large-file-in-the-single-worker > > > > Regards, > > Somasundaram S >