Yes. It improved the performance but not only with spark 1.2 but spark 1.1 also. Precisely, job took more time to run in spark 1.2 with default options but got completed in almost equal time when ran with “lz4” as of spark 1.1 with “lz4”.
From: Aaron Davidson <ilike...@gmail.com<mailto:ilike...@gmail.com>> Date: Saturday, 7 February 2015 1:22 am To: Praveen Garg <praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>> Cc: Raghavendra Pandey <raghavendra.pan...@gmail.com<mailto:raghavendra.pan...@gmail.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Shuffle read/write issue in spark 1.2 Did the problem go away when you switched to lz4? There was a change from the default compression codec fro 1.0 to 1.1, where we went from LZF to Snappy. I don't think there was any such change from 1.1 to 1.2, though. On Fri, Feb 6, 2015 at 12:17 AM, Praveen Garg <praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>> wrote: We tried changing the compression codec from snappy to lz4. It did improve the performance but we are still wondering why default options didn’t work as claimed. From: Raghavendra Pandey <raghavendra.pan...@gmail.com<mailto:raghavendra.pan...@gmail.com>> Date: Friday, 6 February 2015 1:23 pm To: Praveen Garg <praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Shuffle read/write issue in spark 1.2 Even I observed the same issue. On Fri, Feb 6, 2015 at 12:19 AM, Praveen Garg <praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>> wrote: Hi, While moving from spark 1.1 to spark 1.2, we are facing an issue where Shuffle read/write has been increased significantly. We also tried running the job by rolling back to spark 1.1 configuration where we set spark.shuffle.manager to hash and spark.shuffle.blockTransferService to nio. It did improve the performance a bit but it was still much worse than spark 1.1. The scenario seems similar to the bug raised sometime back https://issues.apache.org/jira/browse/SPARK-5081. Has anyone come across any similar issue? Please tell us if any configuration change can help. Regards, Praveen