Re: Parquet writing gets progressively slower

2015-07-25 Thread Michael Kelly
Thanks for the suggestion Cheng, I will try that today. Are there any implications when reading the parquet data if there are no summary files present? Michael On Sat, Jul 25, 2015 at 2:28 AM, Cheng Lian lian.cs@gmail.com wrote: The time is probably spent by ParquetOutputFormat.commitJob.

Parquet writing gets progressively slower

2015-07-24 Thread Michael Kelly
Hi, We are converting some csv log files to parquet but the job is getting progressively slower the more files we add to the parquet folder. The parquet files are being written to s3, we are using a spark standalone cluster running on ec2 and the spark version is 1.4.1. The parquet files are