Re: Reading large set of files in Spark

2016-02-04 Thread Ted Yu
For question #2, see the following method of FileSystem : public abstract boolean delete(Path f, boolean recursive) throws IOException; FYI On Thu, Feb 4, 2016 at 10:58 AM, Akhilesh Pathodia < pathodia.akhil...@gmail.com> wrote: > Hi, > > I am using Spark to read large set of files from

Reading large set of files in Spark

2016-02-04 Thread Akhilesh Pathodia
Hi, I am using Spark to read large set of files from HDFS, applying some formatting on each line and then saving each line as a record in hive. Spark is reading directory paths from kafka. Each directory can have large number of files. I am reading one path from kafka and then processing all