Re: Importing large file with SparkContext.textFile

2016-09-03 Thread Somasundaram Sekar
If the file is not splittable(can I assume the log file is splittable, though) can you advise on how spark handles such caseā€¦? If Spark can't what is the widely used practice? On 3 Sep 2016 7:29 pm, "Raghavendra Pandey" wrote: If your file format is splittable say

Re: Importing large file with SparkContext.textFile

2016-09-03 Thread Raghavendra Pandey
If your file format is splittable say TSV, CSV etc, it will be distributed across all executors. On Sat, Sep 3, 2016 at 3:38 PM, Somasundaram Sekar < somasundar.se...@tigeranalytics.com> wrote: > Hi All, > > > > Would like to gain some understanding on the questions listed below, > > > > 1.

Importing large file with SparkContext.textFile

2016-09-03 Thread Somasundaram Sekar
Hi All, Would like to gain some understanding on the questions listed below, 1. When processing a large file with Apache Spark, with, say, sc.textFile("somefile.xml"), does it split it for parallel processing across executors or, will it be processed as a single chunk in a single