On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell <pwend...@gmail.com> wrote:
> == What default changes should I be aware of? == > 1. The default value of "spark.io.compression.codec" is now "snappy" > --> Old behavior can be restored by switching to "lzf" > > 2. PySpark now performs external spilling during aggregations. > --> Old behavior can be restored by setting "spark.shuffle.spill" to > "false". > > 3. PySpark uses a new heuristic for determining the parallelism of > shuffle operations. > --> Old behavior can be restored by setting > "spark.default.parallelism" to the number of cores in the cluster. > Will these changes be called out in the release notes or somewhere in the docs? That last one (which I believe is what we discovered as the result of SPARK-3333 <https://issues.apache.org/jira/browse/SPARK-3333>) could have a large impact on PySpark users. Nick