spark streaming - how to purge old data files in data directory

2016-06-18 Thread Vamsi Krishna
Hi, I'm on HDP 2.3.2 cluster (Spark 1.4.1). I have a spark streaming app which uses 'textFileStream' to stream simple CSV files and process. I see the old data files that are processed are left in the data directory. What is the right way to purge the old data files in data directory on HDFS?

Re: how to load compressed (gzip) csv file using spark-csv

2016-06-16 Thread Vamsi Krishna
Thanks. It works. On Thu, Jun 16, 2016 at 5:32 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: > It will 'auto-detect' the compression codec by the file extension and then > will decompress and read it correctly. > > Thanks! > > 2016-06-16 20:27 GMT+09:00 Vamsi Krishna

how to load compressed (gzip) csv file using spark-csv

2016-06-16 Thread Vamsi Krishna
Hi, I'm using Spark 1.4.1 (HDP 2.3.2). As per the spark-csv documentation (https://github.com/databricks/spark-csv), I see that we can write to a csv file in compressed form using the 'codec' option. But, didn't see the support for 'codec' option to read a csv file. Is there a way to read a