[ 
https://issues.apache.org/jira/browse/SPARK-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985880#comment-13985880
 ] 

Diana Carroll commented on SPARK-823:
-------------------------------------

Yes, please clarify the documentation, I just ran into this.  the Configuration 
guide (http://spark.apache.org/docs/latest/configuration.html) says the default 
is 8.

In testing this on Standalone Spark, there actually is no default value for the 
variable:
>sc.getConf.contains("spark.default.parallelism")
>res1: Boolean = false

It looks like if the variable is not set, then the default behavior is decided 
in code, e.g. Partitioner.scala:
{code}
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
{code}

> spark.default.parallelism's default is inconsistent across scheduler backends
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-823
>                 URL: https://issues.apache.org/jira/browse/SPARK-823
>             Project: Spark
>          Issue Type: Bug
>          Components: Documentation, Spark Core
>    Affects Versions: 0.8.0, 0.7.3
>            Reporter: Josh Rosen
>            Priority: Minor
>
> The [0.7.3 configuration 
> guide|http://spark-project.org/docs/latest/configuration.html] says that 
> {{spark.default.parallelism}}'s default is 8, but the default is actually 
> max(totalCoreCount, 2) for the standalone scheduler backend, 8 for the Mesos 
> scheduler, and {{threads}} for the local scheduler:
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala#L157
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/mesos/MesosSchedulerBackend.scala#L317
> https://github.com/mesos/spark/blob/v0.7.3/core/src/main/scala/spark/scheduler/local/LocalScheduler.scala#L150
> Should this be clarified in the documentation?  Should the Mesos scheduler 
> backend's default be revised?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to