Thanks for the suggestion Cheng, I will try that today.
Are there any implications when reading the parquet data if there are
no summary files present?
Michael
On Sat, Jul 25, 2015 at 2:28 AM, Cheng Lian lian.cs@gmail.com wrote:
The time is probably spent by ParquetOutputFormat.commitJob.
Hi,
We are converting some csv log files to parquet but the job is getting
progressively slower the more files we add to the parquet folder.
The parquet files are being written to s3, we are using a spark
standalone cluster running on ec2 and the spark version is 1.4.1. The
parquet files are