Re: skipping header from each file

2015-01-09 Thread Sean Owen
I think this was already answered on stackoverflow: http://stackoverflow.com/questions/27854919/skipping-header-file-from-each-csv-file-in-spark where the one additional idea would be: If there were just one header line, in the first record, then the most efficient way to filter it out is:

RE: skipping header from each file

2015-01-09 Thread Somnath Pandeya
May be you can use wholeTextFiles method, which returns filename and content of the file as PariRDD and ,then you can remove the first line from files. -Original Message- From: Hafiz Mujadid [mailto:hafizmujadi...@gmail.com] Sent: Friday, January 09, 2015 11:48 AM To:

Re: skipping header from each file

2015-01-08 Thread Akhil Das
Did you try something like: val file = sc.textFile(/home/akhld/sigmoid/input) val skipped = file.filter(row = !row.contains(header)) skipped.take(10).foreach(println) Thanks Best Regards On Fri, Jan 9, 2015 at 11:48 AM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Suppose I