Repository: spark
Updated Branches:
  refs/heads/branch-1.6 a387cef3a -> ebf87ebc0


[SPARK-11960][MLLIB][DOC] User guide for streaming tests

CC jkbradley mengxr josepablocam

Author: Feynman Liang <feynman.li...@gmail.com>

Closes #10005 from feynmanliang/streaming-test-user-guide.

(cherry picked from commit 55358889309cf2d856b72e72e0f3081dfdf61cfa)
Signed-off-by: Xiangrui Meng <m...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ebf87ebc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ebf87ebc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ebf87ebc

Branch: refs/heads/branch-1.6
Commit: ebf87ebc02075497f4682e3ad0f8e63d33f3b86e
Parents: a387cef
Author: Feynman Liang <feynman.li...@gmail.com>
Authored: Mon Nov 30 15:38:44 2015 -0800
Committer: Xiangrui Meng <m...@databricks.com>
Committed: Mon Nov 30 15:38:51 2015 -0800

----------------------------------------------------------------------
 docs/mllib-guide.md                             |  1 +
 docs/mllib-statistics.md                        | 25 ++++++++++++++++++++
 .../examples/mllib/StreamingTestExample.scala   |  2 ++
 3 files changed, 28 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/docs/mllib-guide.md
----------------------------------------------------------------------
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 54e35fc..43772ad 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -34,6 +34,7 @@ We list major functionality from both below, with links to 
detailed guides.
   * [correlations](mllib-statistics.html#correlations)
   * [stratified sampling](mllib-statistics.html#stratified-sampling)
   * [hypothesis testing](mllib-statistics.html#hypothesis-testing)
+  * [streaming significance 
testing](mllib-statistics.html#streaming-significance-testing)
   * [random data generation](mllib-statistics.html#random-data-generation)
 * [Classification and regression](mllib-classification-regression.html)
   * [linear models (SVMs, logistic regression, linear 
regression)](mllib-linear-methods.html)

http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/docs/mllib-statistics.md
----------------------------------------------------------------------
diff --git a/docs/mllib-statistics.md b/docs/mllib-statistics.md
index ade5b07..de209f6 100644
--- a/docs/mllib-statistics.md
+++ b/docs/mllib-statistics.md
@@ -521,6 +521,31 @@ print(testResult) # summary of the test including the 
p-value, test statistic,
 </div>
 </div>
 
+### Streaming Significance Testing
+MLlib provides online implementations of some tests to support use cases
+like A/B testing. These tests may be performed on a Spark Streaming
+`DStream[(Boolean,Double)]` where the first element of each tuple
+indicates control group (`false`) or treatment group (`true`) and the
+second element is the value of an observation.
+
+Streaming significance testing supports the following parameters:
+
+* `peacePeriod` - The number of initial data points from the stream to
+ignore, used to mitigate novelty effects.
+* `windowSize` - The number of past batches to perform hypothesis
+testing over. Setting to `0` will perform cumulative processing using
+all prior batches.
+
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+[`StreamingTest`](api/scala/index.html#org.apache.spark.mllib.stat.test.StreamingTest)
+provides streaming hypothesis testing.
+
+{% include_example 
scala/org/apache/spark/examples/mllib/StreamingTestExample.scala %}
+</div>
+</div>
+
 
 ## Random data generation
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ebf87ebc/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
----------------------------------------------------------------------
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
index ab29f90..b6677c6 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingTestExample.scala
@@ -64,6 +64,7 @@ object StreamingTestExample {
       dir.toString
     })
 
+    // $example on$
     val data = ssc.textFileStream(dataDir).map(line => line.split(",") match {
       case Array(label, value) => (label.toBoolean, value.toDouble)
     })
@@ -75,6 +76,7 @@ object StreamingTestExample {
 
     val out = streamingTest.registerStream(data)
     out.print()
+    // $example off$
 
     // Stop processing if test becomes significant or we time out
     var timeoutCounter = numBatchesTimeout


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to