git commit: SPARK-4040. Update documentation to exemplify use of local (n) value, fo...
Repository: spark Updated Branches: refs/heads/master 61a5cced0 - 868cd4c3c SPARK-4040. Update documentation to exemplify use of local (n) value, fo... This is a minor docs update which helps to clarify the way local[n] is used for streaming apps. Author: j...@apache.org jayunit100 Closes #2964 from jayunit100/SPARK-4040 and squashes the following commits: 35b5a5e [j...@apache.org] SPARK-4040: Update documentation to exemplify use of local (n) value. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/868cd4c3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/868cd4c3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/868cd4c3 Branch: refs/heads/master Commit: 868cd4c3ca11e6ecc4425b972d9a20c360b52425 Parents: 61a5cce Author: j...@apache.org jayunit100 Authored: Wed Nov 5 15:45:34 2014 -0800 Committer: Matei Zaharia ma...@databricks.com Committed: Wed Nov 5 15:45:34 2014 -0800 -- docs/configuration.md | 10 -- docs/streaming-programming-guide.md | 14 +- 2 files changed, 17 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/868cd4c3/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 685101e..0f9eb81 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -21,16 +21,22 @@ application. These properties can be set directly on a [SparkConf](api/scala/index.html#org.apache.spark.SparkConf) passed to your `SparkContext`. `SparkConf` allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the -`set()` method. For example, we could initialize an application as follows: +`set()` method. For example, we could initialize an application with two threads as follows: + +Note that we run with local[2], meaning two threads - which represents minimal parallelism, +which can help detect bugs that only exist when we run in a distributed context. {% highlight scala %} val conf = new SparkConf() - .setMaster(local) + .setMaster(local[2]) .setAppName(CountingSheep) .set(spark.executor.memory, 1g) val sc = new SparkContext(conf) {% endhighlight %} +Note that we can have more than 1 thread in local mode, and in cases like spark streaming, we may actually +require one to prevent any sort of starvation issues. + ## Dynamically Loading Spark Properties In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For instance, if you'd like to run the same application with different masters or different http://git-wip-us.apache.org/repos/asf/spark/blob/868cd4c3/docs/streaming-programming-guide.md -- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 8bbba88..44a1f3a 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -68,7 +68,9 @@ import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ -// Create a local StreamingContext with two working thread and batch interval of 1 second +// Create a local StreamingContext with two working thread and batch interval of 1 second. +// The master requires 2 cores to prevent from a starvation scenario. + val conf = new SparkConf().setMaster(local[2]).setAppName(NetworkWordCount) val ssc = new StreamingContext(conf, Seconds(1)) {% endhighlight %} @@ -586,11 +588,13 @@ Every input DStream (except file stream) is associated with a single [Receiver]( A receiver is run within a Spark worker/executor as a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Hence, it is important to remember that Spark Streaming application needs to be allocated enough cores to process the received data, as well as, to run the receiver(s). Therefore, few important points to remember are: -# Points to remember: +# Points to remember {:.no_toc} -- If the number of cores allocated to the application is less than or equal to the number of input DStreams / receivers, then the system will receive data, but not be able to process them. -- When running locally, if you master URL is set to local, then there is only one core to run tasks. That is insufficient for programs with even one input DStream (file streams are okay) as the receiver will occupy that core and there will be no core left to process the data. - +- If the number of threads allocated to the application is less than or equal to the number of input DStreams / receivers, then the system
git commit: SPARK-4040. Update documentation to exemplify use of local (n) value, fo...
Repository: spark Updated Branches: refs/heads/branch-1.2 cf2f676f9 - fe4ead299 SPARK-4040. Update documentation to exemplify use of local (n) value, fo... This is a minor docs update which helps to clarify the way local[n] is used for streaming apps. Author: j...@apache.org jayunit100 Closes #2964 from jayunit100/SPARK-4040 and squashes the following commits: 35b5a5e [j...@apache.org] SPARK-4040: Update documentation to exemplify use of local (n) value. (cherry picked from commit 868cd4c3ca11e6ecc4425b972d9a20c360b52425) Signed-off-by: Matei Zaharia ma...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe4ead29 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe4ead29 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fe4ead29 Branch: refs/heads/branch-1.2 Commit: fe4ead2995ab8529602090ed21941b6005a07c9d Parents: cf2f676 Author: j...@apache.org jayunit100 Authored: Wed Nov 5 15:45:34 2014 -0800 Committer: Matei Zaharia ma...@databricks.com Committed: Wed Nov 5 15:45:43 2014 -0800 -- docs/configuration.md | 10 -- docs/streaming-programming-guide.md | 14 +- 2 files changed, 17 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fe4ead29/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 685101e..0f9eb81 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -21,16 +21,22 @@ application. These properties can be set directly on a [SparkConf](api/scala/index.html#org.apache.spark.SparkConf) passed to your `SparkContext`. `SparkConf` allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the -`set()` method. For example, we could initialize an application as follows: +`set()` method. For example, we could initialize an application with two threads as follows: + +Note that we run with local[2], meaning two threads - which represents minimal parallelism, +which can help detect bugs that only exist when we run in a distributed context. {% highlight scala %} val conf = new SparkConf() - .setMaster(local) + .setMaster(local[2]) .setAppName(CountingSheep) .set(spark.executor.memory, 1g) val sc = new SparkContext(conf) {% endhighlight %} +Note that we can have more than 1 thread in local mode, and in cases like spark streaming, we may actually +require one to prevent any sort of starvation issues. + ## Dynamically Loading Spark Properties In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For instance, if you'd like to run the same application with different masters or different http://git-wip-us.apache.org/repos/asf/spark/blob/fe4ead29/docs/streaming-programming-guide.md -- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 8bbba88..44a1f3a 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -68,7 +68,9 @@ import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ -// Create a local StreamingContext with two working thread and batch interval of 1 second +// Create a local StreamingContext with two working thread and batch interval of 1 second. +// The master requires 2 cores to prevent from a starvation scenario. + val conf = new SparkConf().setMaster(local[2]).setAppName(NetworkWordCount) val ssc = new StreamingContext(conf, Seconds(1)) {% endhighlight %} @@ -586,11 +588,13 @@ Every input DStream (except file stream) is associated with a single [Receiver]( A receiver is run within a Spark worker/executor as a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Hence, it is important to remember that Spark Streaming application needs to be allocated enough cores to process the received data, as well as, to run the receiver(s). Therefore, few important points to remember are: -# Points to remember: +# Points to remember {:.no_toc} -- If the number of cores allocated to the application is less than or equal to the number of input DStreams / receivers, then the system will receive data, but not be able to process them. -- When running locally, if you master URL is set to local, then there is only one core to run tasks. That is insufficient for programs with even one input DStream (file streams are okay) as the receiver will occupy that core and there will be no core left to process the data. - +- If the