git commit: SPARK-4040. Update documentation to exemplify use of local (n) value, fo...

2014-11-05 Thread matei
Repository: spark
Updated Branches:
  refs/heads/master 61a5cced0 - 868cd4c3c


SPARK-4040. Update documentation to exemplify use of local (n) value, fo...

This is a minor docs update which helps to clarify the way local[n] is used for 
streaming apps.

Author: j...@apache.org jayunit100

Closes #2964 from jayunit100/SPARK-4040 and squashes the following commits:

35b5a5e [j...@apache.org] SPARK-4040: Update documentation to exemplify use of 
local (n) value.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/868cd4c3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/868cd4c3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/868cd4c3

Branch: refs/heads/master
Commit: 868cd4c3ca11e6ecc4425b972d9a20c360b52425
Parents: 61a5cce
Author: j...@apache.org jayunit100
Authored: Wed Nov 5 15:45:34 2014 -0800
Committer: Matei Zaharia ma...@databricks.com
Committed: Wed Nov 5 15:45:34 2014 -0800

--
 docs/configuration.md   | 10 --
 docs/streaming-programming-guide.md | 14 +-
 2 files changed, 17 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/868cd4c3/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 685101e..0f9eb81 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -21,16 +21,22 @@ application. These properties can be set directly on a
 [SparkConf](api/scala/index.html#org.apache.spark.SparkConf) passed to your
 `SparkContext`. `SparkConf` allows you to configure some of the common 
properties
 (e.g. master URL and application name), as well as arbitrary key-value pairs 
through the
-`set()` method. For example, we could initialize an application as follows:
+`set()` method. For example, we could initialize an application with two 
threads as follows:
+
+Note that we run with local[2], meaning two threads - which represents 
minimal parallelism, 
+which can help detect bugs that only exist when we run in a distributed 
context. 
 
 {% highlight scala %}
 val conf = new SparkConf()
- .setMaster(local)
+ .setMaster(local[2])
  .setAppName(CountingSheep)
  .set(spark.executor.memory, 1g)
 val sc = new SparkContext(conf)
 {% endhighlight %}
 
+Note that we can have more than 1 thread in local mode, and in cases like 
spark streaming, we may actually
+require one to prevent any sort of starvation issues.  
+
 ## Dynamically Loading Spark Properties
 In some cases, you may want to avoid hard-coding certain configurations in a 
`SparkConf`. For
 instance, if you'd like to run the same application with different masters or 
different

http://git-wip-us.apache.org/repos/asf/spark/blob/868cd4c3/docs/streaming-programming-guide.md
--
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index 8bbba88..44a1f3a 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -68,7 +68,9 @@ import org.apache.spark._
 import org.apache.spark.streaming._
 import org.apache.spark.streaming.StreamingContext._
 
-// Create a local StreamingContext with two working thread and batch interval 
of 1 second
+// Create a local StreamingContext with two working thread and batch interval 
of 1 second.
+// The master requires 2 cores to prevent from a starvation scenario.
+
 val conf = new SparkConf().setMaster(local[2]).setAppName(NetworkWordCount)
 val ssc = new StreamingContext(conf, Seconds(1))
 {% endhighlight %}
@@ -586,11 +588,13 @@ Every input DStream (except file stream) is associated 
with a single [Receiver](
 
 A receiver is run within a Spark worker/executor as a long-running task, hence 
it occupies one of the cores allocated to the Spark Streaming application. 
Hence, it is important to remember that Spark Streaming application needs to be 
allocated enough cores to process the received data, as well as, to run the 
receiver(s). Therefore, few important points to remember are:
 
-# Points to remember:
+# Points to remember
 {:.no_toc}
-- If the number of cores allocated to the application is less than or equal to 
the number of input DStreams / receivers, then the system will receive data, 
but not be able to process them.
-- When running locally, if you master URL is set to local, then there is 
only one core to run tasks.  That is insufficient for programs with even one 
input DStream (file streams are okay) as the receiver will occupy that core and 
there will be no core left to process the data.
-
+- If the number of threads allocated to the application is less than or equal 
to the number of input DStreams / receivers, then the system 

git commit: SPARK-4040. Update documentation to exemplify use of local (n) value, fo...

2014-11-05 Thread matei
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 cf2f676f9 - fe4ead299


SPARK-4040. Update documentation to exemplify use of local (n) value, fo...

This is a minor docs update which helps to clarify the way local[n] is used for 
streaming apps.

Author: j...@apache.org jayunit100

Closes #2964 from jayunit100/SPARK-4040 and squashes the following commits:

35b5a5e [j...@apache.org] SPARK-4040: Update documentation to exemplify use of 
local (n) value.

(cherry picked from commit 868cd4c3ca11e6ecc4425b972d9a20c360b52425)
Signed-off-by: Matei Zaharia ma...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe4ead29
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe4ead29
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fe4ead29

Branch: refs/heads/branch-1.2
Commit: fe4ead2995ab8529602090ed21941b6005a07c9d
Parents: cf2f676
Author: j...@apache.org jayunit100
Authored: Wed Nov 5 15:45:34 2014 -0800
Committer: Matei Zaharia ma...@databricks.com
Committed: Wed Nov 5 15:45:43 2014 -0800

--
 docs/configuration.md   | 10 --
 docs/streaming-programming-guide.md | 14 +-
 2 files changed, 17 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fe4ead29/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 685101e..0f9eb81 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -21,16 +21,22 @@ application. These properties can be set directly on a
 [SparkConf](api/scala/index.html#org.apache.spark.SparkConf) passed to your
 `SparkContext`. `SparkConf` allows you to configure some of the common 
properties
 (e.g. master URL and application name), as well as arbitrary key-value pairs 
through the
-`set()` method. For example, we could initialize an application as follows:
+`set()` method. For example, we could initialize an application with two 
threads as follows:
+
+Note that we run with local[2], meaning two threads - which represents 
minimal parallelism, 
+which can help detect bugs that only exist when we run in a distributed 
context. 
 
 {% highlight scala %}
 val conf = new SparkConf()
- .setMaster(local)
+ .setMaster(local[2])
  .setAppName(CountingSheep)
  .set(spark.executor.memory, 1g)
 val sc = new SparkContext(conf)
 {% endhighlight %}
 
+Note that we can have more than 1 thread in local mode, and in cases like 
spark streaming, we may actually
+require one to prevent any sort of starvation issues.  
+
 ## Dynamically Loading Spark Properties
 In some cases, you may want to avoid hard-coding certain configurations in a 
`SparkConf`. For
 instance, if you'd like to run the same application with different masters or 
different

http://git-wip-us.apache.org/repos/asf/spark/blob/fe4ead29/docs/streaming-programming-guide.md
--
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index 8bbba88..44a1f3a 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -68,7 +68,9 @@ import org.apache.spark._
 import org.apache.spark.streaming._
 import org.apache.spark.streaming.StreamingContext._
 
-// Create a local StreamingContext with two working thread and batch interval 
of 1 second
+// Create a local StreamingContext with two working thread and batch interval 
of 1 second.
+// The master requires 2 cores to prevent from a starvation scenario.
+
 val conf = new SparkConf().setMaster(local[2]).setAppName(NetworkWordCount)
 val ssc = new StreamingContext(conf, Seconds(1))
 {% endhighlight %}
@@ -586,11 +588,13 @@ Every input DStream (except file stream) is associated 
with a single [Receiver](
 
 A receiver is run within a Spark worker/executor as a long-running task, hence 
it occupies one of the cores allocated to the Spark Streaming application. 
Hence, it is important to remember that Spark Streaming application needs to be 
allocated enough cores to process the received data, as well as, to run the 
receiver(s). Therefore, few important points to remember are:
 
-# Points to remember:
+# Points to remember
 {:.no_toc}
-- If the number of cores allocated to the application is less than or equal to 
the number of input DStreams / receivers, then the system will receive data, 
but not be able to process them.
-- When running locally, if you master URL is set to local, then there is 
only one core to run tasks.  That is insufficient for programs with even one 
input DStream (file streams are okay) as the receiver will occupy that core and 
there will be no core left to process the data.
-
+- If the