On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell <pwend...@gmail.com> wrote:

> == What default changes should I be aware of? ==
> 1. The default value of "spark.io.compression.codec" is now "snappy"
> --> Old behavior can be restored by switching to "lzf"
>
> 2. PySpark now performs external spilling during aggregations.
> --> Old behavior can be restored by setting "spark.shuffle.spill" to
> "false".
>
> 3. PySpark uses a new heuristic for determining the parallelism of
> shuffle operations.
> --> Old behavior can be restored by setting
> "spark.default.parallelism" to the number of cores in the cluster.
>

Will these changes be called out in the release notes or somewhere in the
docs?

That last one (which I believe is what we discovered as the result of
SPARK-3333 <https://issues.apache.org/jira/browse/SPARK-3333>) could have a
large impact on PySpark users.

Nick

Reply via email to