Yes. It improved the performance but not only with spark 1.2 but spark 1.1 
also. Precisely, job took more time to run in spark 1.2 with default options 
but got completed in almost equal time when ran with “lz4” as of spark 1.1 with 
“lz4”.

From: Aaron Davidson <ilike...@gmail.com<mailto:ilike...@gmail.com>>
Date: Saturday, 7 February 2015 1:22 am
To: Praveen Garg <praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>>
Cc: Raghavendra Pandey 
<raghavendra.pan...@gmail.com<mailto:raghavendra.pan...@gmail.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Shuffle read/write issue in spark 1.2

Did the problem go away when you switched to lz4? There was a change from the 
default compression codec fro 1.0 to 1.1, where we went from LZF to Snappy. I 
don't think there was any such change from 1.1 to 1.2, though.

On Fri, Feb 6, 2015 at 12:17 AM, Praveen Garg 
<praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>> wrote:
We tried changing the compression codec from snappy to lz4. It did improve the 
performance but we are still wondering why default options didn’t work as 
claimed.

From: Raghavendra Pandey 
<raghavendra.pan...@gmail.com<mailto:raghavendra.pan...@gmail.com>>
Date: Friday, 6 February 2015 1:23 pm
To: Praveen Garg <praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Shuffle read/write issue in spark 1.2

Even I observed the same issue.

On Fri, Feb 6, 2015 at 12:19 AM, Praveen Garg 
<praveen.g...@guavus.com<mailto:praveen.g...@guavus.com>> wrote:
Hi,

While moving from spark 1.1 to spark 1.2, we are facing an issue where Shuffle 
read/write has been increased significantly. We also tried running the job by 
rolling back to spark 1.1 configuration where we set spark.shuffle.manager to 
hash and spark.shuffle.blockTransferService to nio. It did improve the 
performance a bit but it was still much worse than spark 1.1. The scenario 
seems similar to the bug raised sometime back 
https://issues.apache.org/jira/browse/SPARK-5081.
Has anyone come across any similar issue? Please tell us if any configuration 
change can help.

Regards, Praveen



Reply via email to