[06/11] incubator-gearpump git commit: [GEARPUMP-213] build docs with MkDocs

mauzhang Thu, 29 Sep 2016 05:07:07 -0700

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/dev/dev-write-1st-app.md
----------------------------------------------------------------------
diff --git a/docs/docs/dev/dev-write-1st-app.md 
b/docs/docs/dev/dev-write-1st-app.md
new file mode 100644
index 0000000..f28a303
--- /dev/null
+++ b/docs/docs/dev/dev-write-1st-app.md
@@ -0,0 +1,370 @@
+We'll use 
[wordcount](https://github.com/apache/incubator-gearpump/tree/master/examples/streaming/wordcount/src/main/scala/org/apache/gearpump/streaming/examples/wordcount)
 as an example to illustrate how to write Gearpump applications.
+
+### Maven/Sbt Settings
+
+Repository and library dependencies can be found at [Maven 
Setting](http://gearpump.apache.org/downloads.html#maven-dependencies).
+
+### IDE Setup (Optional)
+You can get your preferred IDE ready for Gearpump by following [this 
guide](dev-ide-setup).
+
+### Decide which language and API to use for writing 
+Gearpump supports two level APIs:
+
+1. Low level API, which is more similar to Akka programming, operating on each 
event. The API document can be found at [Low Level API 
Doc](http://gearpump.apache.org/releases/latest/api/scala/index.html#org.apache.gearpump.streaming.package).
+
+2. High level API (aka DSL), which is operating on streaming instead of 
individual event. The API document can be found at [DSL API 
Doc](http://gearpump.apache.org/releases/latest/api/scala/index.html#org.apache.gearpump.streaming.dsl.package).
+
+And both APIs have their Java version and Scala version.
+
+So, before you writing your first Gearpump application, you need to decide 
which API to use and which language to use. 
+
+## DSL version for Wordcount
+
+The easiest way to write your streaming application is to write it with 
Gearpump DSL. 
+Below will demostrate how to write WordCount application via Gearpump DSL.
+
+#### In Scala
+
+       :::scala     
+       /** WordCount with High level DSL */
+       object WordCount extends AkkaApp with ArgumentsParser {
+       
+         override val options: Array[(String, CLIOption[Any])] = Array.empty
+       
+         override def main(akkaConf: Config, args: Array[String]): Unit = {
+           val context = ClientContext(akkaConf)
+           val app = StreamApp("dsl", context)
+           val data = "This is a good start, bingo!! bingo!!"
+       
+           //count for each word and output to log
+           app.source(data.lines.toList, 1, "source").
+             // word => (word, count)
+             flatMap(line => line.split("[\\s]+")).map((_, 1)).
+             // (word, count1), (word, count2) => (word, count1 + count2)
+             groupByKey().sum.log
+       
+           val appId = context.submit(app)
+           context.close()
+         }
+       }
+       
+
+#### In Java
+
+       :::java   
+       /** Java version of WordCount with high level DSL API */
+       public class WordCount {
+       
+         public static void main(String[] args) throws InterruptedException {
+           main(ClusterConfig.defaultConfig(), args);
+         }
+       
+         public static void main(Config akkaConf, String[] args) throws 
InterruptedException {
+           ClientContext context = new ClientContext(akkaConf);
+           JavaStreamApp app = new JavaStreamApp("JavaDSL", context, 
UserConfig.empty());
+           List<String> source = Lists.newArrayList("This is a good start, 
bingo!! bingo!!");
+       
+           //create a stream from the string list.
+           JavaStream<String> sentence = app.source(source, 1, 
UserConfig.empty(), "source");
+       
+           //tokenize the strings and create a new stream
+           JavaStream<String> words = sentence.flatMap(new 
FlatMapFunction<String, String>() {
+             @Override
+             public Iterator<String> apply(String s) {
+               return Lists.newArrayList(s.split("\\s+")).iterator();
+             }
+           }, "flatMap");
+       
+           //map each string as (string, 1) pair
+           JavaStream<Tuple2<String, Integer>> ones = words.map(new 
MapFunction<String, Tuple2<String, Integer>>() {
+             @Override
+             public Tuple2<String, Integer> apply(String s) {
+               return new Tuple2<String, Integer>(s, 1);
+             }
+           }, "map");
+       
+           //group by according to string
+           JavaStream<Tuple2<String, Integer>> groupedOnes = ones.groupBy(new 
GroupByFunction<Tuple2<String, Integer>, String>() {
+             @Override
+             public String apply(Tuple2<String, Integer> tuple) {
+               return tuple._1();
+             }
+           }, 1, "groupBy");
+       
+           //for each group, make the sum
+           JavaStream<Tuple2<String, Integer>> wordcount = 
groupedOnes.reduce(new ReduceFunction<Tuple2<String, Integer>>() {
+             @Override
+             public Tuple2<String, Integer> apply(Tuple2<String, Integer> t1, 
Tuple2<String, Integer> t2) {
+               return new Tuple2<String, Integer>(t1._1(), t1._2() + t2._2());
+             }
+           }, "reduce");
+       
+           //output result using log
+           wordcount.log();
+       
+           app.run();
+           context.close();
+         }
+       }
+
+
+## Low level API based Wordcount
+
+### Define Processor(Task) class and Partitioner class
+
+An application is a Directed Acyclic Graph (DAG) of processors. In the 
wordcount example, We will firstly define two processors `Split` and `Sum`, and 
then weave them together.
+
+
+#### Split processor
+
+In the `Split` processor, we simply split a predefined text (the content is 
simplified for conciseness) and send out each split word to `Sum`.
+
+#### In Scala
+
+       :::scala
+       class Split(taskContext : TaskContext, conf: UserConfig) extends 
Task(taskContext, conf) {
+         import taskContext.output
+       
+         override def onStart(startTime : StartTime) : Unit = {
+           self ! Message("start")
+         }
+       
+         override def onNext(msg : Message) : Unit = {
+           Split.TEXT_TO_SPLIT.lines.foreach { line =>
+             line.split("[\\s]+").filter(_.nonEmpty).foreach { msg =>
+               output(new Message(msg, System.currentTimeMillis()))
+             }
+           }
+           self ! Message("continue", System.currentTimeMillis())
+         }
+       }
+       
+       object Split {
+         val TEXT_TO_SPLIT = "some text"
+       }
+
+#### In Java
+
+       :::java
+       public class Split extends Task {
+       
+         public static String TEXT = "This is a good start for java! bingo! 
bingo! ";
+       
+         public Split(TaskContext taskContext, UserConfig userConf) {
+           super(taskContext, userConf);
+         }
+       
+         private Long now() {
+           return System.currentTimeMillis();
+         }
+       
+         @Override
+         public void onStart(StartTime startTime) {
+           self().tell(new Message("start", now()), self());
+         }
+       
+         @Override
+         public void onNext(Message msg) {
+       
+           // Split the TEXT to words
+           String[] words = TEXT.split(" ");
+           for (int i = 0; i < words.length; i++) {
+             context.output(new Message(words[i], now()));
+           }
+           self().tell(new Message("next", now()), self());
+         }
+       }
+       ```
+
+
+Essentially, each processor consists of two descriptions:
+
+1. A `Task` to define the operation.
+
+2. A parallelism level to define the number of tasks of this processor in 
parallel. 
+ 
+Just like `Split`, every processor extends `Task`.  The `onStart` method is 
called once before any message comes in; `onNext` method is called to process 
every incoming message. Note that Gearpump employs the message-driven model and 
that's why Split sends itself a message at the end of `onStart` and `onNext` to 
trigger next message processing.
+
+#### Sum Processor
+
+The structure of `Sum` processor looks much alike. `Sum` does not need to send 
messages to itself since it receives messages from `Split`.
+
+#### In Scala
+
+       :::scala
+       class Sum (taskContext : TaskContext, conf: UserConfig) extends 
Task(taskContext, conf) {
+         private[wordcount] val map : mutable.HashMap[String, Long] = new 
mutable.HashMap[String, Long]()
+       
+         private[wordcount] var wordCount : Long = 0
+         private var snapShotTime : Long = System.currentTimeMillis()
+         private var snapShotWordCount : Long = 0
+       
+         private var scheduler : Cancellable = null
+       
+         override def onStart(startTime : StartTime) : Unit = {
+           scheduler = taskContext.schedule(new FiniteDuration(5, 
TimeUnit.SECONDS),
+             new FiniteDuration(5, TimeUnit.SECONDS))(reportWordCount)
+         }
+       
+         override def onNext(msg : Message) : Unit = {
+           if (null == msg) {
+             return
+           }
+           val current = map.getOrElse(msg.msg.asInstanceOf[String], 0L)
+           wordCount += 1
+           map.put(msg.msg.asInstanceOf[String], current + 1)
+         }
+       
+         override def onStop() : Unit = {
+           if (scheduler != null) {
+             scheduler.cancel()
+           }
+         }
+       
+         def reportWordCount() : Unit = {
+           val current : Long = System.currentTimeMillis()
+           LOG.info(s"Task ${taskContext.taskId} Throughput: ${(wordCount - 
snapShotWordCount, (current - snapShotTime) / 1000)} (words, second)")
+           snapShotWordCount = wordCount
+           snapShotTime = current
+         }
+       }
+       
+#### In Java
+
+       :::java
+       public class Sum extends Task {
+       
+         private Logger LOG = super.LOG();
+         private HashMap<String, Integer> wordCount = new HashMap<String, 
Integer>();
+       
+         public Sum(TaskContext taskContext, UserConfig userConf) {
+           super(taskContext, userConf);
+         }
+       
+         @Override
+         public void onStart(StartTime startTime) {
+           //skip
+         }
+       
+         @Override
+         public void onNext(Message messagePayLoad) {
+           String word = (String) (messagePayLoad.msg());
+           Integer current = wordCount.get(word);
+           if (current == null) {
+             current = 0;
+           }
+           Integer newCount = current + 1;
+           wordCount.put(word, newCount);
+         }
+       }
+
+
+Besides counting the sum, in Scala version, we also define a scheduler to 
report throughput every 5 seconds. The scheduler should be cancelled when the 
computation completes, which could be accomplished overriding the `onStop` 
method. The default implementation of `onStop` is a no-op.
+
+#### Partitioner
+
+A processor could be parallelized to a list of tasks. A `Partitioner` defines 
how the data is shuffled among tasks of Split and Sum. Gearpump has already 
provided two partitioners
+
+* `HashPartitioner`: partitions data based on the message's hashcode
+* `ShufflePartitioner`: partitions data in a round-robin way.
+
+You could define your own partitioner by extending the `Partitioner` 
trait/interface and overriding the `getPartition` method.
+
+
+       :::scala
+       trait Partitioner extends Serializable {
+         def getPartition(msg : Message, partitionNum : Int) : Int
+       }
+
+### Wrap up as an application 
+
+Now, we are able to write our application class, weaving the above components 
together.
+
+The application class extends `App` and `ArgumentsParser which make it easier 
to parse arguments and run main functions.
+
+#### In Scala
+
+       :::scala
+       object WordCount extends App with ArgumentsParser {
+         private val LOG: Logger = LogUtil.getLogger(getClass)
+         val RUN_FOR_EVER = -1
+       
+         override val options: Array[(String, CLIOption[Any])] = Array(
+           "split" -> CLIOption[Int]("<how many split tasks>", required = 
false, defaultValue = Some(1)),
+           "sum" -> CLIOption[Int]("<how many sum tasks>", required = false, 
defaultValue = Some(1))
+         )
+       
+         def application(config: ParseResult) : StreamApplication = {
+           val splitNum = config.getInt("split")
+           val sumNum = config.getInt("sum")
+           val partitioner = new HashPartitioner()
+           val split = Processor[Split](splitNum)
+           val sum = Processor[Sum](sumNum)
+           val app = StreamApplication("wordCount", Graph[Processor[_ <: 
Task], Partitioner](split ~ partitioner ~> sum), UserConfig.empty)
+           app
+         }
+       
+         val config = parse(args)
+         val context = ClientContext()
+         val appId = context.submit(application(config))
+         context.close()
+       }
+       
+
+
+We override `options` value and define an array of command line arguments to 
parse. We want application users to pass in masters' hosts and ports, the 
parallelism of split and sum tasks, and how long to run the example. We also 
specify whether an option is `required` and provide `defaultValue` for some 
arguments.
+
+#### In Java
+
+       :::java
+       
+       /** Java version of WordCount with Processor Graph API */
+       public class WordCount {
+       
+         public static void main(String[] args) throws InterruptedException {
+           main(ClusterConfig.defaultConfig(), args);
+         }
+       
+         public static void main(Config akkaConf, String[] args) throws 
InterruptedException {
+       
+           // For split task, we config to create two tasks
+           int splitTaskNumber = 2;
+           Processor split = new 
Processor(Split.class).withParallelism(splitTaskNumber);
+       
+           // For sum task, we have two summer.
+           int sumTaskNumber = 2;
+           Processor sum = new 
Processor(Sum.class).withParallelism(sumTaskNumber);
+       
+           // construct the graph
+           Graph graph = new Graph();
+           graph.addVertex(split);
+           graph.addVertex(sum);
+       
+           Partitioner partitioner = new HashPartitioner();
+           graph.addEdge(split, partitioner, sum);
+       
+           UserConfig conf = UserConfig.empty();
+           StreamApplication app = new StreamApplication("wordcountJava", 
conf, graph);
+       
+           // create master client
+           // It will read the master settings under gearpump.cluster.masters
+           ClientContext masterClient = new ClientContext(akkaConf);
+       
+           masterClient.submit(app);
+       
+           masterClient.close();
+         }
+       }
+
+
+## Submit application
+
+After all these, you need to package everything into a uber jar and submit the 
jar to Gearpump Cluster. Please check [Application submission 
tool](../introduction/commandline) to command line tool syntax.
+
+## Advanced topic
+For a real application, you definitely need to define your own customized 
message passing between processors.
+Customized message needs customized serializer to help message passing over 
wire.
+Check [this guide](dev-custom-serializer) for how to customize serializer.
+
+### Gearpump for Non-Streaming Usage
+Gearpump is also able to as a base platform to develop non-streaming 
applications. See [this guide](dev-non-streaming-example) on how to use 
Gearpump to develop a distributed shell.


http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/actor_hierarchy.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/actor_hierarchy.png 
b/docs/docs/img/actor_hierarchy.png
new file mode 100644
index 0000000..d971745
Binary files /dev/null and b/docs/docs/img/actor_hierarchy.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/checkpoint_equation.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/checkpoint_equation.png 
b/docs/docs/img/checkpoint_equation.png
new file mode 100644
index 0000000..14da93b
Binary files /dev/null and b/docs/docs/img/checkpoint_equation.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/checkpoint_interval_equation.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/checkpoint_interval_equation.png 
b/docs/docs/img/checkpoint_interval_equation.png
new file mode 100644
index 0000000..0c0414c
Binary files /dev/null and b/docs/docs/img/checkpoint_interval_equation.png 
differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/checkpointing.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/checkpointing.png b/docs/docs/img/checkpointing.png
new file mode 100644
index 0000000..f11eb53
Binary files /dev/null and b/docs/docs/img/checkpointing.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/checkpointing_interval.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/checkpointing_interval.png 
b/docs/docs/img/checkpointing_interval.png
new file mode 100644
index 0000000..dc46317
Binary files /dev/null and b/docs/docs/img/checkpointing_interval.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/clock.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/clock.png b/docs/docs/img/clock.png
new file mode 100644
index 0000000..906d51d
Binary files /dev/null and b/docs/docs/img/clock.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/dag.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/dag.png b/docs/docs/img/dag.png
new file mode 100644
index 0000000..c0ca79f
Binary files /dev/null and b/docs/docs/img/dag.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/dashboard.gif
----------------------------------------------------------------------
diff --git a/docs/docs/img/dashboard.gif b/docs/docs/img/dashboard.gif
new file mode 100644
index 0000000..0170c5f
Binary files /dev/null and b/docs/docs/img/dashboard.gif differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/dashboard.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/dashboard.png b/docs/docs/img/dashboard.png
new file mode 100644
index 0000000..0b5eedd
Binary files /dev/null and b/docs/docs/img/dashboard.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/dashboard_3.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/dashboard_3.png b/docs/docs/img/dashboard_3.png
new file mode 100644
index 0000000..47259fc
Binary files /dev/null and b/docs/docs/img/dashboard_3.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/download.jpg
----------------------------------------------------------------------
diff --git a/docs/docs/img/download.jpg b/docs/docs/img/download.jpg
new file mode 100644
index 0000000..7129c52
Binary files /dev/null and b/docs/docs/img/download.jpg differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/dynamic.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/dynamic.png b/docs/docs/img/dynamic.png
new file mode 100644
index 0000000..09b8a35
Binary files /dev/null and b/docs/docs/img/dynamic.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/exact.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/exact.png b/docs/docs/img/exact.png
new file mode 100644
index 0000000..f11eb53
Binary files /dev/null and b/docs/docs/img/exact.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/failures.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/failures.png b/docs/docs/img/failures.png
new file mode 100644
index 0000000..fa98cdc
Binary files /dev/null and b/docs/docs/img/failures.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/flow_control.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/flow_control.png b/docs/docs/img/flow_control.png
new file mode 100644
index 0000000..7ea9bd2
Binary files /dev/null and b/docs/docs/img/flow_control.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/flowcontrol.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/flowcontrol.png b/docs/docs/img/flowcontrol.png
new file mode 100644
index 0000000..7ea9bd2
Binary files /dev/null and b/docs/docs/img/flowcontrol.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/ha.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/ha.png b/docs/docs/img/ha.png
new file mode 100644
index 0000000..5474d84
Binary files /dev/null and b/docs/docs/img/ha.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/kafka_wordcount.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/kafka_wordcount.png 
b/docs/docs/img/kafka_wordcount.png
new file mode 100644
index 0000000..a43fa55
Binary files /dev/null and b/docs/docs/img/kafka_wordcount.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/layout.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/layout.png b/docs/docs/img/layout.png
new file mode 100644
index 0000000..edffdf8
Binary files /dev/null and b/docs/docs/img/layout.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/logo.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/logo.png b/docs/docs/img/logo.png
new file mode 100644
index 0000000..7575892
Binary files /dev/null and b/docs/docs/img/logo.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/logo.svg
----------------------------------------------------------------------
diff --git a/docs/docs/img/logo.svg b/docs/docs/img/logo.svg
new file mode 100644
index 0000000..5897ca4
--- /dev/null
+++ b/docs/docs/img/logo.svg
@@ -0,0 +1,71 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/";
+   xmlns:cc="http://creativecommons.org/ns#";
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
+   xmlns:svg="http://www.w3.org/2000/svg";
+   xmlns="http://www.w3.org/2000/svg";
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd";
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape";
+   width="89mm"
+   height="89mm"
+   viewBox="0 0 89 89"
+   id="svg4684"
+   version="1.1"
+   inkscape:version="0.91 r13725"
+   sodipodi:docname="gearpump_logo_simple.svg"
+   
inkscape:export-filename="/Users/kamkasravi/Dropbox/gearpump/gearpump_logo_simple.png"
+   inkscape:export-xdpi="157.11"
+   inkscape:export-ydpi="157.11">
+  <defs
+     id="defs4686" />
+  <sodipodi:namedview
+     inkscape:document-units="mm"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="0"
+     inkscape:pageopacity="0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="1.2162434"
+     inkscape:cx="395.86179"
+     inkscape:cy="372.04726"
+     inkscape:current-layer="layer1"
+     id="namedview4688"
+     showgrid="false"
+     inkscape:window-width="1920"
+     inkscape:window-height="1107"
+     inkscape:window-x="0"
+     inkscape:window-y="1"
+     inkscape:window-maximized="1"
+     fit-margin-top="0"
+     fit-margin-left="0"
+     fit-margin-right="0"
+     fit-margin-bottom="0"
+     inkscape:showpageshadow="false"
+     showborder="true" />
+  <metadata
+     id="metadata4690">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage"; />
+        <dc:title />
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(5.4566873e-7,-121)">
+    <path
+       
style="fill:#556ae6;fill-opacity:1;stroke:none;stroke-width:7.08661413;stroke-miterlimit:4;stroke-dasharray:none"
+       d="m 59.451523,122.95775 a 44.775326,44.775326 0 0 0 -21.03861,-2.01136 
l -0.6557,4.60251 a 23.524902,23.524902 0 0 1 4.20498,3.77358 l 
6.73498,-1.78044 a 29.907628,29.907628 0 0 1 4.33792,8.99185 l -5.54364,4.16221 
a 23.524902,23.524902 0 0 1 0.32108,5.63603 l 6.03185,3.50988 a 
29.907628,29.907628 0 0 1 -1.23578,4.84982 29.907628,29.907628 0 0 1 
-2.06484,4.5672 l -6.85567,-0.97589 a 23.524902,23.524902 0 0 1 
-3.77339,4.20446 l 1.78078,6.73569 a 29.907628,29.907628 0 0 1 -8.99151,4.33863 
l -4.16256,-5.54435 a 23.524902,23.524902 0 0 1 -5.63602,0.32108 l 
-3.50989,6.03185 a 29.907628,29.907628 0 0 1 -4.849808,-1.23578 
29.907628,29.907628 0 0 1 -4.5675477,-2.06554 l 0.9757077,-6.85515 a 
23.524902,23.524902 0 0 1 -4.2041157,-3.7727 l -6.64489998,1.75628 a 
44.775326,44.775326 0 0 0 29.97375368,45.3569 44.775326,44.775326 0 0 0 
27.28111,0.61075 l -0.0231,-0.19065 a 23.524902,23.524902 0 0 1 
-5.04173,-2.54001 l -6.05573,3.46919 a 29.907628,29.907628 0 0 1 
-3.58191,-3.49598 29.907628
 ,29.907628 0 0 1 -2.9233,-4.07194 l 4.27314,-5.44883 a 23.524902,23.524902 0 0 
1 -1.75454,-5.36936 l -6.72324,-1.82616 a 29.907628,29.907628 0 0 1 
0.73798,-9.9567 l 6.88304,-0.8322 a 23.524902,23.524902 0 0 1 2.54001,-5.04173 
l -3.46885,-6.05502 a 29.907628,29.907628 0 0 1 3.49494,-3.58228 
29.907628,29.907628 0 0 1 4.07194,-2.92331 l 5.44901,4.27262 a 
23.524902,23.524902 0 0 1 5.36988,-1.75436 l 1.82616,-6.72324 a 
29.907628,29.907628 0 0 1 9.95599,0.73833 l 0.8329,6.8827 a 23.524902,23.524902 
0 0 1 5.04121,2.53983 l 6.055556,-3.46867 a 29.907628,29.907628 0 0 1 
3.58227,3.49494 29.907628,29.907628 0 0 1 1.52975,1.98522 44.775326,44.775326 0 
0 0 -29.979576,-45.3099 z m -29.93126,7.04882 a 15.745845,15.745845 0 0 0 
-20.0396247,9.71017 15.745845,15.745845 0 0 0 9.7101647,20.03962 
15.745845,15.745845 0 0 0 20.03963,-9.71016 15.745845,15.745845 0 0 0 
-9.71017,-20.03963 z m 59.360536,42.82725 -2.32895,2.97009 a 
23.524902,23.524902 0 0 1 0.99553,2.56761 44.775326,44.775326 0 0 0 1.33342,-5.
 5377 z m -19.143966,-2.09047 a 15.745845,15.745845 0 0 0 -14.75675,3.19169 
15.745845,15.745845 0 0 0 -1.61104,22.20982 15.745845,15.745845 0 0 0 
21.90246,1.85092 44.775326,44.775326 0 0 0 2.25015,-2.25582 15.745845,15.745845 
0 0 0 -0.3321,-20.19458 15.745845,15.745845 0 0 0 -7.45272,-4.80203 z"
+       id="path5303"
+       inkscape:connector-curvature="0" />
+  </g>
+</svg>

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/logo2.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/logo2.png b/docs/docs/img/logo2.png
new file mode 100644
index 0000000..959d39e
Binary files /dev/null and b/docs/docs/img/logo2.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/messageLoss.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/messageLoss.png b/docs/docs/img/messageLoss.png
new file mode 100644
index 0000000..80b330a
Binary files /dev/null and b/docs/docs/img/messageLoss.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/netty_transport.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/netty_transport.png 
b/docs/docs/img/netty_transport.png
new file mode 100644
index 0000000..17d57c3
Binary files /dev/null and b/docs/docs/img/netty_transport.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/replay.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/replay.png b/docs/docs/img/replay.png
new file mode 100644
index 0000000..8bbbc43
Binary files /dev/null and b/docs/docs/img/replay.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/shuffle.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/shuffle.png b/docs/docs/img/shuffle.png
new file mode 100644
index 0000000..40c4a2d
Binary files /dev/null and b/docs/docs/img/shuffle.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/storm_gearpump_cluster.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/storm_gearpump_cluster.png 
b/docs/docs/img/storm_gearpump_cluster.png
new file mode 100644
index 0000000..d318623
Binary files /dev/null and b/docs/docs/img/storm_gearpump_cluster.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/storm_gearpump_dag.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/storm_gearpump_dag.png 
b/docs/docs/img/storm_gearpump_dag.png
new file mode 100644
index 0000000..24920f7
Binary files /dev/null and b/docs/docs/img/storm_gearpump_dag.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/submit.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/submit.png b/docs/docs/img/submit.png
new file mode 100644
index 0000000..609c0d7
Binary files /dev/null and b/docs/docs/img/submit.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/submit2.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/submit2.png b/docs/docs/img/submit2.png
new file mode 100644
index 0000000..d3939ee
Binary files /dev/null and b/docs/docs/img/submit2.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/img/through_vs_message_size.png
----------------------------------------------------------------------
diff --git a/docs/docs/img/through_vs_message_size.png 
b/docs/docs/img/through_vs_message_size.png
new file mode 100644
index 0000000..a98c528
Binary files /dev/null and b/docs/docs/img/through_vs_message_size.png differ

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/docs/index.md b/docs/docs/index.md
new file mode 100644
index 0000000..1cd98b8
--- /dev/null
+++ b/docs/docs/index.md
@@ -0,0 +1,35 @@
+---
+layout: global
+displayTitle: Gearpump Overview
+title: Overview
+description: Gearpump GEARPUMP_VERSION documentation homepage
+---
+
+Gearpump is a real-time big data streaming engine.
+It is inspired by recent advances in the [Akka](http://akka.io/) framework and 
a desire to improve on existing streaming frameworks.
+Gearpump is event/message based and featured as low latency handling, high 
performance, exactly once semantics,
+dynamic topology update, [Apache Storm](https://storm.apache.org/) 
compatibility, etc.
+
+The    name    Gearpump        is      a       reference to    the     
engineering term "gear  pump,"  which   is      a       super simple
+pump   that    consists of     only    two     gears,  but     is      very    
powerful at     streaming water.
+
+### Gearpump Technical Highlights
+Gearpump's feature set includes:
+
+* Extremely high performance
+* Low latency
+* Configurable message delivery guarantee (at least once, exactly once).
+* Highly extensible
+* Dynamic DAG
+* Storm compatibility
+* Samoa compatibility
+* Both high level and low level API
+
+### Gearpump Performance
+Per initial benchmarks we are able to process 18 million messages/second (100 
bytes per message) with a 8ms latency on a 4-node cluster.
+
+![Dashboard](img/dashboard.png)
+
+### Gearpump and Akka
+Gearpump is a 100% Akka based platform. We model big data streaming within the 
Akka actor hierarchy.
+![Actor Hierarchy](img/actor_hierarchy.png)

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/introduction/basic-concepts.md
----------------------------------------------------------------------
diff --git a/docs/docs/introduction/basic-concepts.md 
b/docs/docs/introduction/basic-concepts.md
new file mode 100644
index 0000000..a4c49b7
--- /dev/null
+++ b/docs/docs/introduction/basic-concepts.md
@@ -0,0 +1,46 @@
+### System timestamp and Application timestamp
+
+System timestamp denotes the time of backend cluster system. Application 
timestamp denotes the time at which message is generated. For example, for IoT 
edge device, the timestamp at which field sensor device creates a message is 
type of application timestamp, while the timestamp at which that message get 
received by the backend is type of system time.
+
+### Master, and Worker
+
+Gearpump follow master slave architecture. Every cluster contains one or more 
Master node, and several worker nodes. Worker node is responsible to manage 
local resources on single machine, and Master node is responsible to manage 
global resources of the whole cluster.
+
+![Actor Hierarchy](../img/actor_hierarchy.png)
+
+### Application
+
+Application is what we want to parallel and run on the cluster. There are 
different application types, for example MapReduce application and streaming 
application are different application types. Gearpump natively supports 
Streaming Application types, it also contains several templates to help user to 
create custom application types, like distributedShell.
+
+### AppMaster and Executor
+
+In runtime, every application instance is represented by a single AppMaster 
and a list of Executors. AppMaster represents the command and controls center 
of the Application instance. It communicates with user, master, worker, and 
executor to get the job done. Each executor is a parallel unit for distributed 
application. Typically AppMaster and Executor will be started as JVM processes 
on worker nodes.
+
+### Application Submission Flow
+
+When user submits an application to Master, Master will first find an 
available worker to start the AppMaster. After AppMaster is started, AppMaster 
will request Master for more resources (worker) to start executors. The 
Executor now is only an empty container. After the executors are started, the 
AppMaster will then distribute real computation tasks to the executor and run 
them in parallel way.
+
+To submit an application, a Gearpump client specifies a computation defined 
within a DAG and submits this to an active master. The SubmitApplication 
message is sent to the Master who then forwards this to an AppManager.
+
+![Submit App](../img/submit.png)
+Figure: User Submit Application
+
+The AppManager locates an available worker and launches an AppMaster in a 
sub-process JVM of the worker. The AppMaster will then negotiate with the 
Master for Resource allocation in order to distribute the DAG as defined within 
the Application. The allocated workers will then launch Executors (new JVMs).
+
+![Launch Executors and Tasks](../img/submit2.png)
+Figure: Launch Executors and Tasks
+
+### Streaming Topology, Processor, and Task
+
+For streaming application type, each application contains a topology, which is 
a DAG (directed acyclic graph) to describe the data flow. Each node in the DAG 
is a processor. For example, for word count it contains two processors, Split 
and Sum. The Split processor splits a line to a list of words, and then the Sum 
processor summarize the frequency of each word.
+An application is a DAG of processors. Each processor handles messages.
+
+![DAG](../img/dag.png)
+Figure: Processor DAG
+
+### Streaming Task and Partitioner
+
+For streaming application type, Task is the minimum unit of parallelism. In 
runtime, each Processor is paralleled to a list of tasks, with different tasks 
running in different executor. You can define Partitioner to denote the data 
shuffling rule between upstream processor tasks and downstream processor tasks.
+
+![Data Shuffle](../img/shuffle.png)
+Figure: Task Data Shuffling

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/introduction/commandline.md
----------------------------------------------------------------------
diff --git a/docs/docs/introduction/commandline.md 
b/docs/docs/introduction/commandline.md
new file mode 100644
index 0000000..3dfe885
--- /dev/null
+++ b/docs/docs/introduction/commandline.md
@@ -0,0 +1,84 @@
+The commands can be found at: "bin" folder of Gearpump binary.
+
+**NOTE:** on MS Windows platform, please use window shell gear.bat script 
instead. bash script doesn't work well in cygwin/mingw.
+
+### Creating an uber-jar
+
+If you use Maven you can have a look 
[here](https://maven.apache.org/plugins/maven-shade-plugin/) whereas SBT users 
may find [this](https://github.com/sbt/sbt-assembly) useful.
+
+### Submit an new application
+
+You can use the command `gear` under the bin directory to submit, query and 
terminate an application:
+
+       :::bash
+       gear app [-namePrefix <application name prefix>] [-executors <number of 
executors to launch>] [-conf <custom gearpump config file>] -jar xx.jar 
MainClass <arg1> <arg2> ...
+       
+
+### List all running applications
+To list all running applications:
+
+       :::bash
+       gear info  [-conf <custom gearpump config file>]
+
+
+### Kill a running application
+To kill an application:
+
+       :::bash
+       gear kill -appid <application id>  [-conf <custom gearpump config file>]
+
+
+### Submit a storm application to Gearpump Cluster
+For example, to submit a storm application jar:
+
+       :::bash
+       storm -verbose -config storm.yaml -jar 
storm-starter-${STORM_VERSION}.jar storm.starter.ExclamationTopology exclamation
+       
+
+[Storm Compatibility Guide](../dev/dev-storm)
+
+### Start Gearpump Cluster on YARN
+To start a Gearpump Cluster on YARN, you can:
+
+       :::bash
+       yarnclient launch -package 
/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip
+
+`/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip` 
should be available on HDFS.
+
+Please check [YARN Deployment Guide](../deployment/deployment-yarn) for more 
information.
+
+### Start a local cluster
+Masters and workers will be started in one machine:
+
+       :::bash
+       local
+       
+
+Check [Deployment Guide for Local Cluster](../deployment/deployment-local) for 
more information.
+
+### Start master daemons
+
+       :::bash
+       master -ip <Ip address> -port <port where this master is hooking>
+
+
+Please check [Deployment for Standalone 
mode](../deployment/deployment-standalone) for more information.
+
+### Start worker daemons
+
+       :::bash
+       worker
+
+
+Please check [Deployment for Standalone 
mode](../deployment/deployment-standalone) for more information.
+
+### Start UI server
+
+To start UI server, you can:
+
+       :::bash
+       services  [-master <host:port>]
+
+
+The default username and password is "admin:admin", you can check
+[UI Authentication](../deployment/deployment-ui-authentication) to find how to 
manage users.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/introduction/features.md
----------------------------------------------------------------------
diff --git a/docs/docs/introduction/features.md 
b/docs/docs/introduction/features.md
new file mode 100644
index 0000000..4fcb66b
--- /dev/null
+++ b/docs/docs/introduction/features.md
@@ -0,0 +1,67 @@
+### Technical highlights of Gearpump
+
+Gearpump is a high performance, flexible, fault-tolerant, and responsive 
streaming platform with a lot of nice features, its technical highlights 
include:
+
+#### Actors everywhere
+
+The Actor model is a concurrency model proposed by Carl Hewitt at 1973. The 
Actor model is like a micro-service which is cohesive in the inside and 
isolated from other outside actors. Actors are the cornerstone of Gearpump, 
they provide facilities to do message passing, error handling, liveliness 
monitoring. Gearpump uses Actors everywhere; every entity within the cluster 
that can be treated as a service.
+
+![Actor Hierarchy](../img/actor_hierarchy.png)
+
+#### Exactly once Message Processing
+
+Exactly once is defined as: the effect of a message will be calculated only 
once in the persisted state and computation errors in the history will not be 
propagated to future computations.
+
+![Exact Once Semantics](../img/exact.png)
+
+#### Topology DAG DSL
+
+User can submit to Gearpump a computation DAG, which contains a list of nodes 
and edges, and each node can be parallelized to a set of tasks. Gearpump will 
then schedule and distribute different tasks in the DAG to different machines 
automatically. Each task will be started as an actor, which is long running 
micro-service.
+
+![DAG](../img/dag.png)
+
+#### Flow control
+
+Gearpump has built-in support for flow control. For all message passing 
between different tasks, the framework will assure the upstream tasks will not 
flood the downstream tasks.
+![Flow Control](../img/flowcontrol.png)
+
+#### No inherent end to end latency
+
+Gearpump is a message level streaming engine, which means every task in the 
DAG will process messages immediately upon receiving, and deliver messages to 
downstream immediately without waiting. Gearpump doesn't do batching when data 
sourcing.
+
+#### High Performance message passing
+
+By implementing smart batching strategies, Gearpump is extremely effective in 
transferring small messages. In one test of 4 machines, the whole cluster 
throughput can reach 18 million messages per second, with message size of 100 
bytes.
+![Dashboard](../img/dashboard.png)
+
+#### High availability, No single point of failure
+
+Gearpump has a careful design for high availability. We have considered 
message loss, worker machine crash, application crash, master crash, 
brain-split, and have made sure Gearpump recovers when these errors may occur. 
When there is message loss, the lost message will be replayed; when there is a 
worker machine crash or application crash, the related computation tasks will 
be rescheduled on new machines. For master high availability, several master 
nodes will form a Akka cluster, and CRDTs (conflict free data types) are used 
to exchange the state, so as long as there is still a quorum, the master will 
stay functional. When one master node fails, other master nodes in the cluster 
will take over and state will be recovered.
+
+![HA](../img/ha.png)
+
+#### Dynamic Computation DAG
+
+Gearpump provides a feature which allows the user to dynamically add, remove, 
or replace a sub graph at runtime, without the need to restart the whole 
computation topology.
+
+![Dynamic DAG](../img/dynamic.png)
+
+#### Able to handle out of order messages
+
+For a window operation like moving average on a sliding window, it is 
important to make sure we have received all messages in that time window so 
that we can get an accurate result, but how do we handle stranglers or late 
arriving messages? Gearpump solves this problem by tracking the low watermark 
of timestamp of all messages, so it knows whether we've received all the 
messages in the time window or not.
+
+![Clock](../img/clock.png)
+
+#### Customizable platform
+
+Different applications have different requirements related to performance 
metrics, some may want higher throughput, some may require strong eventual data 
consistency; and different applications have different resource requirements 
profiles, some may demand high CPU performance, some may require data locality. 
Gearpump meets these requirements by allowing the user to arbitrate between 
different performance metrics and define customized resource scheduling 
strategies.
+
+#### Built-in Dashboard UI
+
+Gearpump has a built-in dashboard UI to manage the cluster and visualize the 
applications. The UI uses REST calls to connect with backend, so it is easy to 
embed the UI within other dashboards.
+
+![Dashboard](../img/dashboard.gif)
+
+#### Data connectors for Kafka and HDFS
+
+Gearpump has built-in data connectors for Kafka and HDFS. For the Kafka 
connector, we support message replay from a specified timestamp.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/introduction/gearpump-internals.md
----------------------------------------------------------------------
diff --git a/docs/docs/introduction/gearpump-internals.md 
b/docs/docs/introduction/gearpump-internals.md
new file mode 100644
index 0000000..bc7e6bf
--- /dev/null
+++ b/docs/docs/introduction/gearpump-internals.md
@@ -0,0 +1,228 @@
+### Actor Hierarchy?
+
+![Actor Hierarchy](../img/actor_hierarchy.png)
+
+Everything in the diagram is an actor; they fall into two categories, Cluster 
Actors and Application Actors.
+
+#### Cluster Actors
+
+  **Worker**: Maps to a physical worker machine. It is responsible for 
managing resources and report metrics on that machine.
+
+  **Master**: Heart of the cluster, which manages workers, resources, and 
applications. The main function is delegated to three child actors, App 
Manager, Worker Manager, and Resource Scheduler.
+
+#### Application Actors:
+
+  **AppMaster**: Responsible to schedule the tasks to workers and manage the 
state of the application. Different applications have different AppMaster 
instances and are isolated.
+
+  **Executor**: Child of AppMaster, represents a JVM process. Its job is to 
manage the life cycle of tasks and recover the tasks in case of failure.
+
+  **Task**: Child of Executor, does the real job. Every task actor has a 
global unique address. One task actor can send data to any other task actors. 
This gives us great flexibility of how the computation DAG is distributed.
+
+  All actors in the graph are weaved together with actor supervision, and 
actor watching and every error is handled properly via supervisors. In a 
master, a risky job is isolated and delegated to child actors, so it's more 
robust. In the application, an extra intermediate layer "Executor" is created 
so that we can do fine-grained and fast recovery in case of task failure. A 
master watches the lifecycle of AppMaster and worker to handle the failures, 
but the life cycle of Worker and AppMaster are not bound to a Master Actor by 
supervision, so that Master node can fail independently.  Several Master Actors 
form an Akka cluster, the Master state is exchanged using the Gossip protocol 
in a conflict-free consistent way so that there is no single point of failure. 
With this hierarchy design, we are able to achieve high availability.
+
+### Application Clock and Global Clock Service
+
+Global clock service will track the minimum time stamp of all pending messages 
in the system. Every task will update its own minimum-clock to global clock 
service; the minimum-clock of task is decided by the minimum of:
+
+  - Minimum time stamp of all pending messages in the inbox.
+  - Minimum time stamp of all un-acked outgoing messages. When there is 
message loss, the minimum clock will not advance.
+  - Minimum clock of all task states. If the state is accumulated by a lot of 
input messages, then the clock value is decided by the oldest message's 
timestamp. The state clock will advance by doing snapshots to persistent 
storage or by fading out the effect of old messages.
+
+![Clock](../img/clock.png)
+
+The global clock service will keep track of all task minimum clocks 
effectively and maintain a global view of minimum clock. The global minimum 
clock value is monotonically increasing; it means that all source messages 
before this clock value have been processed. If there is message loss or task 
crash, the global minimum clock will stop.
+
+### How do we optimize the message passing performance?
+
+For streaming application, message passing performance is extremely important. 
For example, one streaming platform may need to process millions of messages 
per second with millisecond level latency. High throughput and low latency is 
not that easy to achieve. There are a number of challenges:
+
+#### First Challenge: Network is not efficient for small messages
+
+In streaming, typical message size is very small, usually less than 100 bytes 
per message, like the floating car GPS data. But network efficiency is very bad 
when transferring small messages. As you can see in below diagram, when message 
size is 50 bytes, it can only use 20% bandwidth. How to improve the throughput?
+
+![Throughput vs. Message Size](../img/through_vs_message_size.png)
+
+#### Second Challenge: Message overhead is too big
+
+For each message sent between two actors, it contains sender and receiver 
actor path. When sending over the wire, the overhead of this ActorPath is not 
trivial. For example, the below actor path takes more than 200 bytes.
+
+       :::bash
+       
akka.tcp://[email protected]:51582/remote/akka.tcp/[email protected]:48948/remote/akka.tcp/[email protected]:43676/user/master/Worker1/app_0_executor_0/group_1_task_0#-768886794
+
+
+#### How do we solve this?
+
+We implement a custom Netty transportation layer with Akka extension. In the 
below diagram, Netty Client will translate ActorPath to TaskId, and Netty 
Server will translate it back. Only TaskId will be passed on wire, it is only 
about 10 bytes, the overhead is minimized. Different Netty Client Actors are 
isolated; they will not block each other.
+
+![Netty Transport](../img/netty_transport.png)
+
+For performance, effective batching is really the key! We group multiple 
messages to a single batch and send it on the wire. The batch size is not 
fixed; it is adjusted dynamically based on network status. If the network is 
available, we will flush pending messages immediately without waiting; 
otherwise we will put the message in a batch and trigger a timer to flush the 
batch later.
+
+### How do we do flow Control?
+
+Without flow control, one task can easily flood another task with too many 
messages, causing out of memory error. Typical flow control will use a TCP-like 
sliding window, so that source and target can run concurrently without blocking 
each other.
+
+![Flow Control](../img/flow_control.png)
+Figure: Flow control, each task is "star" connected to input tasks and output 
tasks
+
+The difficult part for our problem is that each task can have multiple input 
tasks and output tasks. The input and output must be geared together so that 
the back pressure can be properly propagated from downstream to upstream. The 
flow control also needs to consider failures, and it needs to be able to 
recover when there is message loss.
+Another challenge is that the overhead of flow control messages can be big. If 
we ack every message, there will be huge amount of acked messages in the 
system, degrading streaming performance. The approach we adopted is to use 
explicit AckRequest message. The target tasks will only ack back when they 
receive the AckRequest message, and the source will only send AckRequest when 
it feels necessary. With this approach, we can largely reduce the overhead.
+
+### How do we detect message loss?
+
+For example, for web ads, we may charge for every click, we don't want to 
miscount.  The streaming platform needs to effectively track what messages have 
been lost, and recover as fast as possible.
+
+![Message Loss](../img/messageLoss.png)
+Figure: Message Loss Detection
+
+We use the flow control message AckRequest and Ack to detect message loss. The 
target task will count how many messages has been received since last 
AckRequest, and ack the count back to source task. The source task will check 
the count and find message loss.
+This is just an illustration, the real case is more difficulty, we need to 
handle zombie tasks, and in-the-fly stale messages.
+
+### How Gearpump know what messages to replay?
+
+In some applications, a message cannot be lost, and must be replayed. For 
example, during the money transfer, the bank will SMS us the verification code. 
If that message is lost, the system must replay it so that money transfer can 
continue. We made the decision to use **source end message storage** and **time 
stamp based replay**.
+
+![Replay](../img/replay.png)
+Figure: Replay with Source End Message Store
+
+Every message is immutable, and tagged with a timestamp. We have an assumption 
that the timestamp is approximately incremental (allow small ratio message 
disorder).
+
+We assume the message is coming from a replay-able source, like Kafka queue; 
otherwise the message will be stored at customizable source end "message 
store". When the source task sends the message downstream, the timestamp and 
offset of the message is also check-pointed to offset-timestamp storage 
periodically. During recovery, the system will first retrieve the right time 
stamp and offset from the offset-timestamp storage, then it will replay the 
message store from that time stamp and offset. A Timestamp Filter will filter 
out old messages in case the message in message store is not strictly 
time-ordered.
+
+### Master High Availability
+
+In a distributed streaming system, any part can fail. The system must stay 
responsive and do recovery in case of errors.
+
+![HA](../img/ha.png)
+Figure: Master High Availability
+
+We use Akka clustering to implement the Master high availability. The cluster 
consists of several master nodes, but no worker nodes. With clustering 
facilities, we can easily detect and handle the failure of master node crash. 
The master state is replicated on all master nodes with the Typesafe 
akka-data-replication  library, when one master node crashes, another standby 
master will read the master state and take over. The master state contains the 
submission data of all applications. If one application dies, a master can use 
that state to recover that application. CRDT LwwMap  is used to represent the 
state; it is a hash map that can converge on distributed nodes without 
conflict. To have strong data consistency, the state read and write must happen 
on a quorum of master nodes.
+
+### How we do handle failures?
+
+With Akka's powerful actor supervision, we can implement a resilient system 
relatively easy. In Gearpump, different applications have a different AppMaster 
instance, they are totally isolated from each other. For each application, 
there is a supervision tree, AppMaster->Executor->Task. With this supervision 
hierarchy, we can free ourselves from the headache of zombie process, for 
example if AppMaster is down, Akka supervisor will ensure the whole tree is 
shutting down.
+
+There are multiple possible failure scenarios
+
+![Failures](../img/failures.png)
+Figure: Possible Failure Scenarios and Error Supervision Hierarchy
+
+#### What happens when the Master crashes?
+
+In case of a master crash, other standby masters will be notified, they will 
resume the master state, and take over control. Worker and AppMaster will also 
be notified, They will trigger a process to find the new active master, until 
the resolution complete. If AppMaster or Worker cannot resolve a new Master in 
a time out, they will make suicide and kill themselves.
+
+#### What happens when a worker crashes?
+
+In case of a worker crash, the Master will get notified and stop scheduling 
new computation to this worker. All supervised executors on current worker will 
be killed, AppMaster can treat it as recovery of executor crash like [What 
happen when an executor crashes?](#what-happen-when-an-executor-crashes)
+
+#### What happens when the AppMaster crashes?
+
+If an AppMaster crashes, Master will schedule a new resource to create a new 
AppMaster Instance elsewhere, and then the AppMaster will handle the recovery 
inside the application. For streaming, it will recover the latest min clock and 
other state from disk, request resources from master to start executors, and 
restart the tasks with recovered min clock.
+
+#### What happen when an executor crashes?
+
+If an executor crashes, its supervisor AppMaster will get notified, and 
request a new resource from the active master to start a new executor, to run 
the tasks which were located on the crashed executor.
+
+#### What happen when tasks crash?
+
+If a task throws an exception, its supervisor executor will restart that Task.
+
+When "at least once" message delivery is enabled, it will trigger the message 
replaying in the case of message loss. First AppMaster will read the latest 
minimum clock from the global clock service(or clock storage if the clock 
service crashes), then AppMaster will restart all the task actors to get a 
fresh task state, then the source end tasks will replay messages from that 
minimum clock.
+
+### How does "exactly-once" message delivery work?
+
+For some applications, it is extremely important to do "exactly once" message 
delivery. For example, for a real-time billing system, we will not want to bill 
the customer twice. The goal of "exactly once" message delivery is to make sure:
+  The error doesn't accumulate, today's error will not be accumulated to 
tomorrow.
+  Transparent to application developer
+We use global clock to synchronize the distributed transactions. We assume 
every message from the data source will have a unique timestamp, the timestamp 
can be a part of the message body, or can be attached later with system clock 
when the message is injected into the streaming system. With this global 
synchronized clock, we can coordinate all tasks to checkpoint at same timestamp.
+
+![Checkpoint](../img/checkpointing.png)
+Figure: Checkpointing and Exactly-Once Message delivery
+
+Workflow to do state checkpointing:
+
+1. The coordinator asks the streaming system to do checkpoint at timestamp Tc.
+2. For each application task, it will maintain two states, checkpoint state 
and current state. Checkpoint state only contains information before timestamp 
Tc. Current state contains all information.
+3. When global minimum clock is larger than Tc, it means all messages older 
than Tc has been processed; the checkpoint state will no longer change, so we 
will then persist the checkpoint state to storage safely.
+4. When there is message loss, we will start the recovery process.
+5. To recover, load the latest checkpoint state from store, and then use it to 
restore the application status.
+6. Data source replays messages from the checkpoint timestamp.
+
+The checkpoint interval is determined by global clock service dynamically. 
Each data source will track the max timestamp of input messages. Upon receiving 
min clock updates, the data source will report the time delta back to global 
clock service. The max time delta is the upper bound of the application state 
timespan. The checkpoint interval is bigger than max delta time:
+
+![Checkpoint Equation](../img/checkpoint_equation.png)
+
+![Checkpointing Interval](../img/checkpointing_interval.png)
+Figure: How to determine Checkpoint Interval
+
+After the checkpoint interval is notified to tasks by global clock service, 
each task will calculate its next checkpoint timestamp autonomously without 
global synchronization.
+
+![Checkpoint Interval Equation](../img/checkpoint_interval_equation.png)
+
+For each task, it contains two states, checkpoint state and current state. The 
code to update the state is shown in listing below.
+
+       :::python
+       TaskState(stateStore, initialTimeStamp):
+         currentState = stateStore.load(initialTimeStamp)
+         checkpointState = currentState.clone
+         checkpointTimestamp = nextCheckpointTimeStamp(initialTimeStamp)
+       onMessage(msg):
+         if (msg.timestamp < checkpointTimestamp):
+           checkpointState.updateMessage(msg)
+         currentState.updateMessage(msg)  
+         maxClock = max(maxClock, msg.timeStamp)
+       
+       onMinClock(minClock):
+         if (minClock > checkpointTimestamp):
+           stateStore.persist(checkpointState)
+           checkpointTimeStamp = nextCheckpointTimeStamp(maxClock)
+           checkpointState = currentState.clone
+       
+       onNewCheckpointInterval(newStep):
+         step = newStep  
+       nextCheckpointTimeStamp(timestamp):
+         checkpointTimestamp = (1 + timestamp/step) * step
+       
+
+List 1: Task Transactional State Implementation
+
+### What is dynamic graph, and how it works?
+
+The DAG can be modified dynamically. We want to be able to dynamically add, 
remove, and replace a sub-graph.
+
+![Dynamic DAG](../img/dynamic.png)
+Figure: Dynamic Graph, Attach, Replace, and Remove
+
+## At least once message delivery and Kafka
+
+The Kafka source example project and tutorials can be found at:
+- [Kafka connector example 
project](https://github.com/apache/incubator-gearpump/tree/master/examples/streaming/kafka)
+- [Connect with Kafka source](../dev/dev-connectors)
+
+In this doc, we will talk about how the at least once message delivery works.
+
+We will use the WordCount example of [source 
tree](https://github.com/apache/incubator-gearpump/tree/master/examples/streaming/kafka)
  to illustrate.
+
+### How the kafka WordCount DAG looks like:
+
+It contains three processors:
+![Kafka WordCount](../img/kafka_wordcount.png)
+
+- KafkaStreamProducer(or KafkaSource) will read message from kafka queue.
+- Split will split lines to words
+- Sum will summarize the words to get a count for each word.
+
+### How to read data from Kafka
+
+We use KafkaSource, please check [Connect with Kafka 
source](../dev/dev-connectors) for the introduction.
+
+Please note that we have set a startTimestamp for the KafkaSource, which means 
KafkaSource will read from Kafka queue starting from messages whose timestamp 
is near startTimestamp.
+
+### What happen where there is Task crash or message loss?
+When there is message loss, the AppMaster will first pause the global clock 
service so that the global minimum timestamp no longer change, then it will 
restart the Kafka source tasks. Upon restart, Kafka Source will start to 
replay. It will first read the global minimum timestamp from AppMaster, and 
start to read message from that timestamp.
+
+### What method KafkaSource used to read messages from a start timestamp? As 
we know Kafka queue doesn't expose the timestamp information.
+
+Kafka queue only expose the offset information for each partition. What 
KafkaSource do is to maintain its own mapping from Kafka offset to  Application 
timestamp, so that we can map from a application timestamp to a Kafka offset, 
and replay Kafka messages from that Kafka offset.
+
+The mapping between Application timestamp with Kafka offset is stored in a 
distributed file system or as a Kafka topic.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/introduction/message-delivery.md
----------------------------------------------------------------------
diff --git a/docs/docs/introduction/message-delivery.md 
b/docs/docs/introduction/message-delivery.md
new file mode 100644
index 0000000..1ffeb9e
--- /dev/null
+++ b/docs/docs/introduction/message-delivery.md
@@ -0,0 +1,47 @@
+## What is At Least Once Message Delivery?
+
+Messages could be lost on delivery due to network partitions. **At Least Once 
Message Delivery** (at least once) means the lost messages are delivered one or 
more times such that at least one is processed and acknowledged by the whole 
flow. 
+
+Gearpump guarantees at least once for any source that is able to replay 
message from a past timestamp. In Gearpump, each message is tagged with a 
timestamp, and the system tracks the minimum timestamp of all pending messages 
(the global minimum clock). On message loss, application will be restarted to 
the global minimum clock. Since the source is able to replay from the global 
minimum clock, all pending messages before the restart will be replayed. 
Gearpump calls that kind of source `TimeReplayableSource` and already provides 
a built in
+[KafkaSource](gearpump-internals#at-least-once-message-delivery-and-kafka). 
With the KafkaSource to ingest data into Gearpump, users are guaranteed at 
least once message delivery.
+
+## What is Exactly Once Message Delivery?
+
+At least once delivery doesn't guarantee the correctness of the application 
result. For instance,  for a task keeping the count of received messages, there 
could be overcount with duplicated messages and the count is lost on task 
failure.
+ In that case, **Exactly Once Message Delivery** (exactly once) is required, 
where state is updated by a message exactly once. This further requires that 
duplicated messages are filtered out and in-memory states are persisted.
+
+Users are guaranteed exactly once in Gearpump if they use both a 
`TimeReplayableSource` to ingest data and the Persistent API to manage their in 
memory states. With the Persistent API, user state is periodically checkpointed 
by the system to a persistent store (e.g HDFS) along with its checkpointed 
time. Gearpump tracks the global minimum checkpoint timestamp of all pending 
states (global minimum checkpoint clock), which is persisted as well. On 
application restart, the system restores states at the global minimum 
checkpoint clock and source replays messages from that clock. This ensures that 
a message updates all states exactly once.
+
+### Persistent API
+Persistent API consists of `PersistentTask` and `PersistentState`.
+
+Here is an example of using them to keep count of incoming messages.
+
+       :::scala
+       class CountProcessor(taskContext: TaskContext, conf: UserConfig)
+         extends PersistentTask[Long](taskContext, conf) {
+
+         override def persistentState: PersistentState[Long] = {
+        import com.twitter.algebird.Monoid.longMonoid
+        new NonWindowState[Long](new AlgebirdMonoid(longMonoid), new 
ChillSerializer[Long])
+      }
+
+      override def processMessage(state: PersistentState[Long], message: 
Message): Unit = {
+        state.update(message.timestamp, 1L)
+      }
+    }
+
+   
+The `CountProcessor` creates a customized `PersistentState` which will be 
managed by `PersistentTask` and overrides the `processMessage` method to define 
how the state is updated on a new message (each new message counts as `1`, 
which is added to the existing value)
+
+Gearpump has already offered two types of states
+ 
+1. NonWindowState - state with no time or other boundary
+2. WindowState - each state is bounded by a time window
+
+They are intended for states that satisfy monoid laws.
+
+1. has binary associative operation, like `+`  
+2. has an identity element, like `0`
+
+In the above example, we make use of the `longMonoid` from [Twitter's 
Algebird](https://github.com/twitter/algebird) library which provides a bunch 
of useful monoids. 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/introduction/performance-report.md
----------------------------------------------------------------------
diff --git a/docs/docs/introduction/performance-report.md 
b/docs/docs/introduction/performance-report.md
new file mode 100644
index 0000000..e4254d1
--- /dev/null
+++ b/docs/docs/introduction/performance-report.md
@@ -0,0 +1,34 @@
+# Performance Evaluation
+
+To illustrate the performance of Gearpump, we mainly focused on two aspects, 
throughput and latency, using a micro benchmark called SOL (an example in the 
Gearpump package) whose topology is quite simple. SOLStreamProducer delivers 
messages to SOLStreamProcessor constantly and SOLStreamProcessor does nothing. 
We set up a 4-nodes cluster with 10GbE network and each node's hardware is 
briefly shown as follows:
+
+Processor: 32 core Intel(R) Xeon(R) CPU E5-2690 2.90GHz
+Memory: 64GB
+
+## Throughput
+
+We tried to explore the upper bound of the throughput, after launching 48 
SOLStreamProducer and 48 SOLStreamProcessor the Figure below shows that the 
whole throughput of the cluster can reach about 18 million messages/second(100 
bytes per message)
+
+## Latency
+
+When we transfer message at the max throughput above, the average latency 
between two tasks is 8ms.
+
+## Fault Recovery time
+
+When the corruption is detected, for example the Executor is down, Gearpump 
will reallocate the resource and restart the application. It takes about 10 
seconds to recover the application.
+
+![Dashboard](../img/dashboard.png)
+
+## How to setup the benchmark environment?
+
+### Prepare the env
+
+1). Set up a 4-nodes Gearpump cluster with 10GbE network which have 4 Workers 
on each node. In our test environment, each node has 64GB memory and Intel(R) 
Xeon(R) 32-core processor E5-2690 2.90GHz. Make sure the metrics is enabled in 
Gearpump.
+
+2). Submit a SOL application with 48 StreamProducers and 48 StreamProcessors:
+
+    :::bash
+    bin/gear app -jar 
./examples/sol-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}-assembly.jar 
-streamProducer 48 -streamProcessor 48
+
+
+3). Launch Gearpump's dashboard and browser http://$HOST:8090/, switch to the 
Applications tab and you can see the detail information of your application. 
The HOST should be the node runs dashboard.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/docs/introduction/submit-your-1st-application.md
----------------------------------------------------------------------
diff --git a/docs/docs/introduction/submit-your-1st-application.md 
b/docs/docs/introduction/submit-your-1st-application.md
new file mode 100644
index 0000000..21cfaf2
--- /dev/null
+++ b/docs/docs/introduction/submit-your-1st-application.md
@@ -0,0 +1,39 @@
+Before you can submit and run your first Gearpump application, you will need a 
running Gearpump service.
+There are multiple ways to run Gearpump [Local 
mode](../deployment/deployment-local), [Standalone 
mode](../deployment/deployment-standalone), [YARN 
mode](../deployment/deployment-yarn) or [Docker 
mode](../deployment/deployment-docker).
+
+The easiest way is to run Gearpump in [Local 
mode](../deployment/deployment-local).
+Any Linux, MacOSX or Windows desktop can be used with zero configuration.
+
+In the example below, we assume your are running in [Local 
mode](../deployment/deployment-local).
+If you running Gearpump in one of the other modes, you will need to configure 
the Gearpump client to
+connect to the Gearpump service by setting the `gear.conf` configuration path 
in classpath.
+Within this file, you will need to change the parameter 
`gearpump.cluster.masters` to the correct Gearpump master(s).
+See [Configuration](../deployment/deployment-configuration) for details.
+
+## Steps to submit your first Application
+
+### Step 1: Submit application
+After the cluster is started, you can submit an example wordcount application 
to the cluster
+
+Open another shell,
+
+       :::bash
+       ### To run WordCount example
+       bin/gear app -jar 
examples/wordcount-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}-assembly.jar 
org.apache.gearpump.streaming.examples.wordcount.WordCount
+
+
+###  Step 2: Congratulations, you've submitted your first application.
+
+To view the application status and metrics, start the Web UI services, and 
browse to [http://127.0.0.1:8090](http://127.0.0.1:8090) to check the status.
+The default username and password is "admin:admin", you can check
+[UI Authentication](../deployment/deployment-ui-authentication) to find how to 
manage users.
+
+![Dashboard](../img/dashboard.gif)
+
+**NOTE:** the UI port setting can be defined in configuration, please check 
section [Configuration](../deployment/deployment-configuration).
+
+## A quick Look at the Web UI
+TBD
+
+## Other Application Examples
+Besides wordcount, there are several other example applications. Please check 
the source tree examples/ for detail information.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/faq.md
----------------------------------------------------------------------
diff --git a/docs/faq.md b/docs/faq.md
deleted file mode 100644
index 39bb7f2..0000000
--- a/docs/faq.md
+++ /dev/null
@@ -1,39 +0,0 @@
----
-layout: global
-displayTitle: Frequently Asked Questions
-title: faq
-description: Gearpump Frequently Asked Questions
----
-
-##### What's Relation between Gearpump and YARN?
-Gearpump can run on top of YARN as a YARN application. Gearpump's 
ApplicationMaster provides the application management, deployment and 
scheduling of DAGs after arbitrating and receiving container resources from YARN
-
-##### Relation with Storm and Spark Streaming
-Storm and spark streaming are proven platforms, and there are many production 
deployments. Compared with them, Gearpump is less proven and there is no 
production deployment yet. However, there is no single platform that can cover 
every use case; Gearpump has its own +1 points in some special fields. As an 
instance, for IoT use cases, Gearpump may be considered convenient because the 
topology can be deployed to edge device with feature of location transparency. 
For another example, when users want to upgrade the application online without 
service interruption, Gearpump may be suitable as it can dynamically modify the 
computation DAG on the fly.
-
-##### What does Gearpump mean?
-The name Gearpump is a reference to the engineering term "Gear Pump", which is 
a super simple pump that consists of only two gears, but is very powerful at 
streaming water from left to right.
-
-##### Why not using akka persistence to store the checkpoint file?
-1. We only checkpoint file to disk when necessary.(not record level)
-2. We have custom checkpoint file format
-
-##### Have you considered the akka stream API for the high level DSL?
-We are looking into a hands of candidates for what a good DSL should be. Akka 
stream API is one of the candidates.
-
-##### Why wrapping the Task, instead of using the Actor interface directly?
-
-1. It is easier to conduct Unit test
-2. We have custom logic and messages to ensure the data consistency, like flow 
control and message loss detection.
-3. As the Gearpump interface evolves rapidly, for now, we want to be 
conservative in exposing more powerful functions so that we don't tie our hands 
for future refactoring. It lets us feel safe.
-
-##### Why does my task has extremely high message latency (e.g. 10 seconds) ?
-
-Please check whether you are doing blocking jobs (e.g. sleep, IO) in your 
task. By default, all tasks in an executor share a thread pool. The blocking 
tasks could use up all the threads while other tasks don't get a chance to run. 
In that case, you can set `gearpump.task-dispatcher` to 
`"gearpump.single-thread-dispatcher"` in `gear.conf` such that a unique thread 
is dedicated to each task.
-
-Generally, we recommend use the default `share-thread-pool-dispatcher` which 
has better performance and only turn to the `single-thread-dispatcher` when you 
have to.
-
-##### Why can't I open Dashboard even if the Services process has been 
launched successfully ?
-
-By default, our Services process binds to a local **IPv6 port**. It's possible 
that another process on your system has already taken up the same **IPv4 
port**. You may check by `lsof -i -P | grep -i "Listen"` if your system is 
Unix/Linux. 
-

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/features.md
----------------------------------------------------------------------
diff --git a/docs/features.md b/docs/features.md
deleted file mode 100644
index 210ac7a..0000000
--- a/docs/features.md
+++ /dev/null
@@ -1,74 +0,0 @@
----
-layout: global
-displayTitle: Gearpump Technical Highlights
-title: highlights
-description: Gearpump Technical Highlights
----
-
-### Technical highlights of Gearpump
-
-Gearpump is a high performance, flexible, fault-tolerant, and responsive 
streaming platform with a lot of nice features, its technical highlights 
include:
-
-#### Actors everywhere
-
-The Actor model is a concurrency model proposed by Carl Hewitt at 1973. The 
Actor model is like a micro-service which is cohesive in the inside and 
isolated from other outside actors. Actors are the cornerstone of Gearpump, 
they provide facilities to do message passing, error handling, liveliness 
monitoring. Gearpump uses Actors everywhere; every entity within the cluster 
that can be treated as a service.
-
-![Actor Hierarchy](img/actor_hierarchy.png)
-
-#### Exactly once Message Processing
-
-Exactly once is defined as: the effect of a message will be calculated only 
once in the persisted state and computation errors in the history will not be 
propagated to future computations.
-
-![Exact Once Semantics](img/exact.png)
-
-#### Topology DAG DSL
-
-User can submit to Gearpump a computation DAG, which contains a list of nodes 
and edges, and each node can be parallelized to a set of tasks. Gearpump will 
then schedule and distribute different tasks in the DAG to different machines 
automatically. Each task will be started as an actor, which is long running 
micro-service.
-
-![DAG](img/dag.png)
-
-#### Flow control
-
-Gearpump has built-in support for flow control. For all message passing 
between different tasks, the framework will assure the upstream tasks will not 
flood the downstream tasks.
-![Flow Control](img/flowcontrol.png)
-
-#### No inherent end to end latency
-
-Gearpump is a message level streaming engine, which means every task in the 
DAG will process messages immediately upon receiving, and deliver messages to 
downstream immediately without waiting. Gearpump doesn't do batching when data 
sourcing.
-
-#### High Performance message passing
-
-By implementing smart batching strategies, Gearpump is extremely effective in 
transferring small messages. In one test of 4 machines, the whole cluster 
throughput can reach 18 million messages per second, with message size of 100 
bytes.
-![Dashboard](img/dashboard.png)
-
-#### High availability, No single point of failure
-
-Gearpump has a careful design for high availability. We have considered 
message loss, worker machine crash, application crash, master crash, 
brain-split, and have made sure Gearpump recovers when these errors may occur. 
When there is message loss, the lost message will be replayed; when there is a 
worker machine crash or application crash, the related computation tasks will 
be rescheduled on new machines. For master high availability, several master 
nodes will form a Akka cluster, and CRDTs (conflict free data types) are used 
to exchange the state, so as long as there is still a quorum, the master will 
stay functional. When one master node fails, other master nodes in the cluster 
will take over and state will be recovered.
-
-![HA](img/ha.png)
-
-#### Dynamic Computation DAG
-
-Gearpump provides a feature which allows the user to dynamically add, remove, 
or replace a sub graph at runtime, without the need to restart the whole 
computation topology.
-
-![Dynamic DAG](img/dynamic.png)
-
-#### Able to handle out of order messages
-
-For a window operation like moving average on a sliding window, it is 
important to make sure we have received all messages in that time window so 
that we can get an accurate result, but how do we handle stranglers or late 
arriving messages? Gearpump solves this problem by tracking the low watermark 
of timestamp of all messages, so it knows whether we've received all the 
messages in the time window or not.
-
-![Clock](img/clock.png)
-
-#### Customizable platform
-
-Different applications have different requirements related to performance 
metrics, some may want higher throughput, some may require strong eventual data 
consistency; and different applications have different resource requirements 
profiles, some may demand high CPU performance, some may require data locality. 
Gearpump meets these requirements by allowing the user to arbitrate between 
different performance metrics and define customized resource scheduling 
strategies.
-
-#### Built-in Dashboard UI
-
-Gearpump has a built-in dashboard UI to manage the cluster and visualize the 
applications. The UI uses REST calls to connect with backend, so it is easy to 
embed the UI within other dashboards.
-
-![Dashboard](img/dashboard.gif)
-
-#### Data connectors for Kafka and HDFS
-
-Gearpump has built-in data connectors for Kafka and HDFS. For the Kafka 
connector, we support message replay from a specified timestamp.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/761e04c6/docs/fonts/glyphicons-halflings-regular.eot
----------------------------------------------------------------------
diff --git a/docs/fonts/glyphicons-halflings-regular.eot 
b/docs/fonts/glyphicons-halflings-regular.eot
deleted file mode 100644
index b93a495..0000000
Binary files a/docs/fonts/glyphicons-halflings-regular.eot and /dev/null differ

[06/11] incubator-gearpump git commit: [GEARPUMP-213] build docs with MkDocs

Reply via email to