[08/12] incubator-distributedlog git commit: Release 0.4.0-incubating

sijie Wed, 26 Apr 2017 11:42:18 -0700

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/bookkeeper.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/bookkeeper.rst 
b/website/docs/0.4.0-incubating/admin_guide/bookkeeper.rst
new file mode 100644
index 0000000..c60fbf1
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/bookkeeper.rst
@@ -0,0 +1,213 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: admin-guide
+top-nav-pos: 6
+top-nav-title: BookKeeper
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-parent: admin-guide
+sub-nav-id: bookkeeper
+sub-nav-pos: 6
+sub-nav-title: BookKeeper
+---
+
+.. contents:: BookKeeper
+
+BookKeeper
+==========
+
+For reliable BookKeeper service, you should deploy BookKeeper in a cluster.
+
+Run from bookkeeper source
+--------------------------
+
+The version of BookKeeper that DistributedLog depends on is not the official 
opensource version.
+It is twitter's production version `4.3.4-TWTTR`, which is available in 
`https://github.com/twitter/bookkeeper`. 
+We are working actively with BookKeeper community to merge all twitter's 
changes back to the community.
+
+The major changes in Twitter's bookkeeper includes:
+
+- BOOKKEEPER-670_: Long poll reads and LastAddConfirmed piggyback. It is to 
reduce the tailing read latency.
+- BOOKKEEPER-759_: Delay ensemble change if it doesn't break ack quorum 
constraint. It is to reduce the write latency on bookie failures.
+- BOOKKEEPER-757_: Ledger recovery improvements, to reduce the latency on 
ledger recovery.
+- Misc improvements on bookie recovery and bookie storage.
+
+.. _BOOKKEEPER-670: https://issues.apache.org/jira/browse/BOOKKEEPER-670
+.. _BOOKKEEPER-759: https://issues.apache.org/jira/browse/BOOKKEEPER-759
+.. _BOOKKEEPER-757: https://issues.apache.org/jira/browse/BOOKKEEPER-757
+
+To build bookkeeper, run:
+
+1. First checkout the bookkeeper source code from twitter's branch.
+
+.. code-block:: bash
+
+    $ git clone https://github.com/twitter/bookkeeper.git bookkeeper   
+
+
+2. Build the bookkeeper package:
+
+.. code-block:: bash
+
+    $ cd bookkeeper 
+    $ mvn clean package assembly:single -DskipTests
+
+However, since `bookkeeper-server` is one of the dependency of 
`distributedlog-service`.
+You could simply run bookkeeper using same set of scripts provided in 
`distributedlog-service`.
+In the following sections, we will describe how to run bookkeeper using the 
scripts provided in
+`distributedlog-service`.
+
+Run from distributedlog source
+------------------------------
+
+Build
++++++
+
+First of all, build DistributedLog:
+
+.. code-block:: bash
+
+    $ mvn clean install -DskipTests
+
+
+Configuration
++++++++++++++
+
+The configuration file `bookie.conf` under `distributedlog-service/conf` is a 
template of production
+configuration to run a bookie node. Most of the configuration settings are 
good for production usage.
+You might need to configure following settings according to your environment 
and hardware platform.
+
+Port
+^^^^
+
+By default, the service port is `3181`, where the bookie server listens on. 
You can change the port
+to whatever port you like by modifying the following setting.
+
+::
+
+    bookiePort=3181
+
+
+Disks
+^^^^^
+
+You need to configure following settings according to the disk layout of your 
hardware. It is recommended
+to put `journalDirectory` under a separated disk from others for performance. 
It is okay to set
+`indexDirectories` to be same as `ledgerDirectories`. However, it is 
recommended to put `indexDirectories`
+to a SSD driver for better performance.
+
+::
+    
+    # Directory Bookkeeper outputs its write ahead log
+    journalDirectory=/tmp/data/bk/journal
+
+    # Directory Bookkeeper outputs ledger snapshots
+    ledgerDirectories=/tmp/data/bk/ledgers
+
+    # Directory in which index files will be stored.
+    indexDirectories=/tmp/data/bk/ledgers
+
+
+To better understand how bookie nodes work, please check bookkeeper_ website 
for more details.
+
+ZooKeeper
+^^^^^^^^^
+
+You need to configure following settings to point the bookie to the zookeeper 
server that it is using.
+You need to make sure `zkLedgersRootPath` exists before starting the bookies.
+
+::
+   
+    # Root zookeeper path to store ledger metadata
+    # This parameter is used by zookeeper-based ledger manager as a root znode 
to
+    # store all ledgers.
+    zkLedgersRootPath=/messaging/bookkeeper/ledgers
+    # A list of one of more servers on which zookeeper is running.
+    zkServers=localhost:2181
+
+
+Stats Provider
+^^^^^^^^^^^^^^
+
+Bookies use `StatsProvider` to expose its metrics. The `StatsProvider` is a 
pluggable library to
+adopt to various stats collecting systems. Please check monitoring_ for more 
details.
+
+.. _monitoring: ./monitoring
+
+::
+    
+    # stats provide - use `codahale` metrics library
+    
statsProviderClass=org.apache.bookkeeper.stats.CodahaleMetricsServletProvider
+
+    ### Following settings are stats provider related settings
+
+    # Exporting codahale stats in http port `9001`
+    codahaleStatsHttpPort=9001
+
+
+Index Settings
+^^^^^^^^^^^^^^
+
+- `pageSize`: size of a index page in ledger cache, in bytes. If there are 
large number
+  of ledgers and each ledger has fewer entries, smaller index page would 
improve memory usage.
+- `pageLimit`: The maximum number of index pages in ledger cache. If nummber 
of index pages
+  reaches the limitation, bookie server starts to swap some ledgers from 
memory to disk.
+  Increase this value when swap becomes more frequent. But make sure 
`pageLimit*pageSize`
+  should not be more than JVM max memory limitation.
+
+
+Journal Settings
+^^^^^^^^^^^^^^^^
+
+- `journalMaxGroupWaitMSec`: The maximum wait time for group commit. It is 
valid only when
+  `journalFlushWhenQueueEmpty` is false.
+- `journalFlushWhenQueueEmpty`: Flag indicates whether to flush/sync journal. 
If it is `true`,
+  bookie server will sync journal when there is no other writes in the journal 
queue.
+- `journalBufferedWritesThreshold`: The maximum buffered writes for group 
commit, in bytes.
+  It is valid only when `journalFlushWhenQueueEmpty` is false.
+- `journalBufferedEntriesThreshold`: The maximum buffered writes for group 
commit, in entries.
+  It is valid only when `journalFlushWhenQueueEmpty` is false.
+
+Setting `journalFlushWhenQueueEmpty` to `true` will produce low latency when 
the traffic is low.
+However, the latency varies a lost when the traffic is increased. So it is 
recommended to set
+`journalMaxGroupWaitMSec`, `journalBufferedEntriesThreshold` and 
`journalBufferedWritesThreshold`
+to reduce the number of fsyncs made to journal disk, to achieve sustained low 
latency.
+
+Thread Settings
+^^^^^^^^^^^^^^^
+
+It is recommended to configure following settings to align with the cpu cores 
of the hardware.
+
+::
+    
+    numAddWorkerThreads=4
+    numJournalCallbackThreads=4
+    numReadWorkerThreads=4
+    numLongPollWorkerThreads=4
+
+Run 
++++
+
+As `bookkeeper-server` is shipped as part of `distributedlog-service`, you 
could use the `dlog-daemon.sh`
+script to start `bookie` as daemon thread.
+
+Start the bookie:
+
+.. code-block:: bash
+
+    $ ./distributedlog-service/bin/dlog-daemon.sh start bookie --conf 
/path/to/bookie/conf
+
+
+Stop the bookie:
+
+.. code-block:: bash
+
+    $ ./distributedlog-service/bin/dlog-daemon.sh stop bookie
+
+
+Please check bookkeeper_ website for more details.
+
+.. _bookkeeper: http://bookkeeper.apache.org/


http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/hardware.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/hardware.rst 
b/website/docs/0.4.0-incubating/admin_guide/hardware.rst
new file mode 100644
index 0000000..ab1dd91
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/hardware.rst
@@ -0,0 +1,138 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: admin-guide
+top-nav-pos: 3
+top-nav-title: Hardware
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-parent: admin-guide
+sub-nav-id: hardware
+sub-nav-pos: 3
+sub-nav-title: Hardware
+---
+
+.. contents:: Hardware
+
+Hardware
+========
+
+Figure 1 describes the data flow of DistributedLog. Write traffic comes to 
`Write Proxy`
+and the data is replicated in `RF` (replication factor) ways to `BookKeeper`. 
BookKeeper
+stores the replicated data and keeps the data for a given retention period. 
The data is
+read by `Read Proxy` and fanout to readers.
+
+In such layered architecture, each layer has its own responsibilities and 
different resource
+requirements. It makes the capacity and cost model much clear and users could 
scale
+different layers independently.
+
+.. figure:: ../images/costmodel.png
+   :align: center
+
+   Figure 1. DistributedLog Cost Model
+
+Metrics
+~~~~~~~
+
+There are different metrics measuring the capability of a service instance in 
each layer
+(e.g a `write proxy` node, a `bookie` storage node, a `read proxy` node and 
such). These metrics
+can be `rps` (requests per second), `bps` (bits per second), `number of 
streams` that a instance
+can support, and latency requirements. `bps` is the best and simple factor on 
measuring the
+capability of current distributedlog architecture.
+
+Write Proxy
+~~~~~~~~~~~
+
+Write Proxy (WP) is a stateless serving service that writes and replicates 
fan-in traffic into BookKeeper.
+The capability of a write proxy instance is purely dominated by the *OUTBOUND* 
network bandwidth,
+which is reflected as incoming `Write Throughput` and `Replication Factor`.
+
+Calculating the capacity of Write Proxy (number of instances of write proxies) 
is pretty straightforward.
+The formula is listed as below.
+
+::
+
+    Number of Write Proxies = (Write Throughput) * (Replication Factor) / 
(Write Proxy Outbound Bandwidth)
+
+As it is bandwidth bound, we'd recommend using machines that have high network 
bandwith (e.g 10Gb NIC).
+
+The cost estimation is also straightforward.
+
+::
+
+    Bandwidth TCO ($/day/MB) = (Write Proxy TCO) / (Write Proxy Outbound 
Bandwidth)
+    Cost of write proxies = (Write Throughput) * (Replication Factor) / 
(Bandwidth TCO)
+
+CPUs
+^^^^
+
+DistributedLog is not CPU bound. You can run an instance with 8 or 12 cores 
just fine.
+
+Memories
+^^^^^^^^
+
+There's a fair bit of caching. Consider running with at least 8GB of memory.
+
+Disks
+^^^^^
+
+This is a stateless process, disk performances are not relevant.
+
+Network
+^^^^^^^
+
+Depending on your throughput, you might be better off running this with 10Gb 
NIC. In this scenario, you can easily achieves 350MBps of writes.
+
+
+BookKeeper
+~~~~~~~~~~
+
+BookKeeper is the log segment store, which is a stateful service. There are 
two factors to measure the
+capability of a Bookie instance: `bandwidth` and `storage`. The bandwidth is 
majorly dominated by the
+outbound traffic from write proxy, which is `(Write Throughput) * (Replication 
Factor)`. The storage is
+majorly dominated by the traffic and also `Retention Period`.
+
+Calculating the capacity of BookKeeper (number of instances of bookies) is a 
bit more complicated than Write
+Proxy. The total number of instances is the maximum number of the instances of 
bookies calculated using
+`bandwidth` and `storage`.
+
+::
+
+    Number of bookies based on bandwidth = (Write Throughput) * (Replication 
Factor) / (Bookie Inbound Bandwidth)
+    Number of bookies based on storage = (Write Throughput) * (Replication 
Factor) * (Replication Factor) / (Bookie disk space)
+    Number of bookies = maximum((number of bookies based on bandwidth), 
(number of bookies based on storage))
+
+We should consider both bandwidth and storage when choosing the hardware for 
bookies. There are several rules to follow:
+- A bookie should have multiple disks.
+- The number of disks used as journal disks should have similar I/O bandwidth 
as its *INBOUND* network bandwidth. For example, if you plan to use a disk for 
journal which I/O bandwidth is around 100MBps, a 1Gb NIC is a better choice 
than 10Gb NIC.
+- The number of disks used as ledger disks should be large enough to hold data 
if retention period is typical long.
+
+The cost estimation is straightforward based on the number of bookies 
estimated above.
+
+::
+
+    Cost of bookies = (Number of bookies) * (Bookie TCO)
+
+Read Proxy
+~~~~~~~~~~
+
+Similar as Write Proxy, Read Proxy is also dominated by *OUTBOUND* bandwidth, 
which is reflected as incoming `Write Throughput` and `Fanout Factor`.
+
+Calculating the capacity of Read Proxy (number of instances of read proxies) 
is also pretty straightforward.
+The formula is listed as below.
+
+::
+
+    Number of Read Proxies = (Write Throughput) * (Fanout Factor) / (Read 
Proxy Outbound Bandwidth)
+
+As it is bandwidth bound, we'd recommend using machines that have high network 
bandwith (e.g 10Gb NIC).
+
+The cost estimation is also straightforward.
+
+::
+
+    Bandwidth TCO ($/day/MB) = (Read Proxy TCO) / (Read Proxy Outbound 
Bandwidth)
+    Cost of read proxies = (Write Throughput) * (Fanout Factor) / (Bandwidth 
TCO)
+

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/loadtest.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/loadtest.rst 
b/website/docs/0.4.0-incubating/admin_guide/loadtest.rst
new file mode 100644
index 0000000..c7591eb
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/loadtest.rst
@@ -0,0 +1,100 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: admin-guide
+top-nav-pos: 2
+top-nav-title: Load Test
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-parent: admin-guide
+sub-nav-id: performance
+sub-nav-pos: 2
+sub-nav-title: Load Test
+---
+
+.. contents:: Load Test
+
+Load Test
+=========
+
+Overview
+--------
+
+Under distributedlog-benchmark you will find a set of applications intended 
for generating large amounts of load in a distributedlog cluster. These 
applications are suitable for load testing, performance testing, benchmarking, 
or even simply smoke testing a distributedlog cluster.
+
+The dbench script can run in several modes:
+
+1. bkwrite - Benchmark the distributedlog write path using the core library
+
+2. write - Benchmark the distributedlog write path, via write proxy, using the 
thin client
+
+3. read - Benchmark the distributedlog read path using the core library
+
+
+Running Dbench
+--------------
+
+The distributedlog-benchmark binary dbench is intended to be run 
simultaneously from many machines with identical settings. Together, all 
instances of dbench comprise a benchmark job. How you launch a benchmark job 
will depend on your operating environment. We recommend using a cluster 
scheduler like aurora or kubernetes to simplify the process, but tools like 
capistrano can also simplify this process greatly.
+
+The benchmark script can be found at
+
+::
+
+    distributedlog-benchmark/bin/dbench
+
+Arguments may be passed to this script via environment variables. The 
available arguments depend on the execution mode. For an up to date list, check 
the script itself.
+
+
+Write to Proxy with Thin Client
+-------------------------------
+
+The proxy write test (mode = `write`) can be used to send writes to a proxy 
cluster to be written to a set of streams.
+
+For example to use the proxy write test to generate 10000 requests per second 
across 10 streams using 50 machines, run the following command on each machine.
+
+::
+
+    STREAM_NAME_PREFIX=loadtest_
+    BENCHMARK_DURATION=60 # minutes
+    DL_NAMESPACE=<dl namespace>
+    NUM_STREAMS=10
+    INITIAL_RATE=200
+    distributedlog-benchmark/bin/dbench write
+
+
+Write to BookKeeper with Core Library
+-------------------------------------
+
+The core library write test (mode = `bkwrite`) can be used to send writes to 
directly to bookkeeper using the core library.
+
+For example to use the core library write test to generate 100MBps across 10 
streams using 100 machines, run the following command on each machine.
+
+::
+
+    STREAM_NAME_PREFIX=loadtest_
+    BENCHMARK_DURATION=60 # minutes
+    DL_NAMESPACE=<dl namespace>
+    NUM_STREAMS=10
+    INITIAL_RATE=1024
+    MSG_SIZE=1024
+    distributedlog-benchmark/bin/dbench bkwrite
+
+
+Read from BookKeeper with Core Library
+--------------------------------------
+
+The core library read test (mode = `read`) can be used to read directly from 
bookkeeper using the core library.
+
+For example to use the core library read test to read from 10 streams on 100 
instances, run the following command on each machine.
+
+::
+
+    STREAM_NAME_PREFIX=loadtest_
+    BENCHMARK_DURATION=60 # minutes
+    DL_NAMESPACE=<dl namespace>
+    MAX_STREAM_ID=9
+    NUM_READERS_PER_STREAM=5
+    TRUNCATION_INTERVAL=60 # seconds
+    distributedlog-benchmark/bin/dbench read

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/main.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/main.rst 
b/website/docs/0.4.0-incubating/admin_guide/main.rst
new file mode 100644
index 0000000..5fb7a3a
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/main.rst
@@ -0,0 +1,13 @@
+---
+title: "Admin Guide"
+layout: guide
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-id: admin-guide
+sub-nav-parent: _root_
+sub-nav-group-title: Admin Guide
+sub-nav-pos: 1
+sub-nav-title: Admin Guide
+
+---

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/monitoring.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/monitoring.rst 
b/website/docs/0.4.0-incubating/admin_guide/monitoring.rst
new file mode 100644
index 0000000..59645c1
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/monitoring.rst
@@ -0,0 +1,398 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: admin-guide
+top-nav-pos: 4
+top-nav-title: Monitoring
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-parent: admin-guide
+sub-nav-id: monitoring
+sub-nav-pos: 4
+sub-nav-title: Monitoring
+---
+
+.. contents:: Monitoring
+
+Monitoring
+==========
+
+DistributedLog uses the stats library provided by Apache BookKeeper for 
reporting metrics in
+both the server and the client. This can be configured to report stats using 
pluggable stats
+provider to integrate with your monitoring system.
+
+Stats Provider
+~~~~~~~~~~~~~~
+
+`StatsProvider` is a provider that provides different kinds of stats logger 
for different scopes.
+The provider is also responsible for reporting its managed metrics.
+
+::
+
+    // Create the stats provider
+    StatsProvider statsProvider = ...;
+    // Start the stats provider
+    statsProvider.start(conf);
+    // Stop the stats provider
+    statsProvider.stop();
+
+Stats Logger
+____________
+
+A scoped `StatsLogger` is a stats logger that records 3 kinds of statistics
+under a given `scope`.
+
+A `StatsLogger` could be either created by obtaining from stats provider with
+the scope name:
+
+::
+
+    StatsProvider statsProvider = ...;
+    StatsLogger statsLogger = statsProvider.scope("test-scope");
+
+Or created by obtaining from a stats logger with a sub scope name:
+
+::
+
+    StatsLogger rootStatsLogger = ...;
+    StatsLogger subStatsLogger = rootStatsLogger.scope("sub-scope");
+
+All the metrics in a stats provider are managed in a hierarchical of scopes.
+
+::
+
+    // all stats recorded by `rootStatsLogger` are under 'root'
+    StatsLogger rootStatsLogger = statsProvider.scope("root");
+    // all stats recorded by 'subStatsLogger1` are under 'root/scope1'
+    StatsLogger subStatsLogger1 = statsProvider.scope("scope1");
+    // all stats recorded by 'subStatsLogger2` are under 'root/scope2'
+    StatsLogger subStatsLogger2 = statsProvider.scope("scope2");
+
+Counters
+++++++++
+
+A `Counter` is a cumulative metric that represents a single numerical value. A 
**counter**
+is typically used to count requests served, tasks completed, errors occurred, 
etc. Counters
+should not be used to expose current counts of items whose number can also go 
down, e.g.
+the number of currently running tasks. Use `Gauges` for this use case.
+
+To change a counter, use:
+
+::
+    
+    StatsLogger statsLogger = ...;
+    Counter births = statsLogger.getCounter("births");
+    // increment the counter
+    births.inc();
+    // decrement the counter
+    births.dec();
+    // change the counter by delta
+    births.add(-10);
+    // reset the counter
+    births.reset();
+
+Gauges
+++++++
+
+A `Gauge` is a metric that represents a single numerical value that can 
arbitrarily go up and down.
+
+Gauges are typically used for measured values like temperatures or current 
memory usage, but also
+"counts" that can go up and down, like the number of running tasks.
+
+To define a gauge, stick the following code somewhere in the initialization:
+
+::
+
+    final AtomicLong numPendingRequests = new AtomicLong(0L);
+    StatsLogger statsLogger = ...;
+    statsLogger.registerGauge(
+        "num_pending_requests",
+        new Gauge<Number>() {
+            @Override
+            public Number getDefaultValue() {
+                return 0;
+            }
+            @Override
+            public Number getSample() {
+                return numPendingRequests.get();
+            }
+        });
+
+The gauge must always return a numerical value when sampling.
+
+Metrics (OpStats)
++++++++++++++++++
+
+A `OpStats` is a set of metrics that represents the statistics of an 
`operation`. Those metrics
+include `success` or `failure` of the operations and its distribution (also 
known as `Histogram`).
+It is usually used for timing.
+
+::
+
+    StatsLogger statsLogger = ...;
+    OpStatsLogger writeStats = statsLogger.getOpStatsLogger("writes");
+    long writeLatency = ...;
+
+    // register success op
+    writeStats.registerSuccessfulEvent(writeLatency);
+
+    // register failure op
+    writeStats.registerFailedEvent(writeLatency);
+
+Available Stats Providers
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All the available stats providers are listed as below:
+
+* Twitter Science Stats (deprecated)
+* Twitter Ostrich Stats (deprecated)
+* Twitter Finagle Stats
+* Codahale Stats
+
+Twitter Science Stats
+_____________________
+
+Use following dependency to enable Twitter science stats provider.
+
+::
+
+   <dependency>
+     <groupId>org.apache.bookkeeper.stats</groupId>
+     <artifactId>twitter-science-provider</artifactId>
+     <version>${bookkeeper.version}</version>
+   </dependency>
+
+Construct the stats provider for clients.
+
+::
+
+    StatsProvider statsProvider = new TwitterStatsProvider();
+    DistributedLogConfiguration conf = ...;
+
+    // starts the stats provider (optional)
+    statsProvider.start(conf);
+
+    // all the dl related stats are exposed under "dlog"
+    StatsLogger statsLogger = statsProvider.getStatsLogger("dlog");
+    DistributedLogNamespace namespace = 
DistributedLogNamespaceBuilder.newBuilder()
+        .uri(...)
+        .conf(conf)
+        .statsLogger(statsLogger)
+        .build();
+
+    ...
+
+    // stop the stats provider (optional)
+    statsProvider.stop();
+
+
+Expose the stats collected by the stats provider by configuring following 
settings:
+
+::
+
+    // enable exporting the stats
+    statsExport=true
+    // exporting the stats at port 8080
+    statsHttpPort=8080
+
+
+If exporting stats is enabled, all the stats are exported by the http endpoint.
+You could curl the http endpoint to check the stats.
+
+::
+
+    curl -s <host>:8080/vars
+
+
+check ScienceStats_ for more details.
+
+.. _ScienceStats: 
https://github.com/twitter/commons/tree/master/src/java/com/twitter/common/stats
+
+Twitter Ostrich Stats
+_____________________
+
+Use following dependency to enable Twitter ostrich stats provider.
+
+::
+
+   <dependency>
+     <groupId>org.apache.bookkeeper.stats</groupId>
+     <artifactId>twitter-ostrich-provider</artifactId>
+     <version>${bookkeeper.version}</version>
+   </dependency>
+
+Construct the stats provider for clients.
+
+::
+
+    StatsProvider statsProvider = new TwitterOstrichProvider();
+    DistributedLogConfiguration conf = ...;
+
+    // starts the stats provider (optional)
+    statsProvider.start(conf);
+
+    // all the dl related stats are exposed under "dlog"
+    StatsLogger statsLogger = statsProvider.getStatsLogger("dlog");
+    DistributedLogNamespace namespace = 
DistributedLogNamespaceBuilder.newBuilder()
+        .uri(...)
+        .conf(conf)
+        .statsLogger(statsLogger)
+        .build();
+
+    ...
+
+    // stop the stats provider (optional)
+    statsProvider.stop();
+
+
+Expose the stats collected by the stats provider by configuring following 
settings:
+
+::
+
+    // enable exporting the stats
+    statsExport=true
+    // exporting the stats at port 8080
+    statsHttpPort=8080
+
+
+If exporting stats is enabled, all the stats are exported by the http endpoint.
+You could curl the http endpoint to check the stats.
+
+::
+
+    curl -s <host>:8080/stats.txt
+
+
+check Ostrich_ for more details.
+
+.. _Ostrich: https://github.com/twitter/ostrich
+
+Twitter Finagle Metrics
+_______________________
+
+Use following dependency to enable bridging finagle stats receiver to 
bookkeeper's stats provider.
+All the stats exposed by the stats provider will be collected by finagle stats 
receiver and exposed
+by Twitter's admin service.
+
+::
+
+   <dependency>
+     <groupId>org.apache.bookkeeper.stats</groupId>
+     <artifactId>twitter-finagle-provider</artifactId>
+     <version>${bookkeeper.version}</version>
+   </dependency>
+
+Construct the stats provider for clients.
+
+::
+
+    StatsReceiver statsReceiver = ...; // finagle stats receiver
+    StatsProvider statsProvider = new FinagleStatsProvider(statsReceiver);
+    DistributedLogConfiguration conf = ...;
+
+    // the stats provider does nothing on start.
+    statsProvider.start(conf);
+
+    // all the dl related stats are exposed under "dlog"
+    StatsLogger statsLogger = statsProvider.getStatsLogger("dlog");
+    DistributedLogNamespace namespace = 
DistributedLogNamespaceBuilder.newBuilder()
+        .uri(...)
+        .conf(conf)
+        .statsLogger(statsLogger)
+        .build();
+
+    ...
+
+    // the stats provider does nothing on stop.
+    statsProvider.stop();
+
+
+check `finagle metrics library`__ for more details on how to expose the stats.
+
+.. _TwitterServer: https://twitter.github.io/twitter-server/Migration.html
+
+__ TwitterServer_
+
+Codahale Metrics
+________________
+
+Use following dependency to enable Twitter ostrich stats provider.
+
+::
+
+   <dependency>
+     <groupId>org.apache.bookkeeper.stats</groupId>
+     <artifactId>codahale-metrics-provider</artifactId>
+     <version>${bookkeeper.version}</version>
+   </dependency>
+
+Construct the stats provider for clients.
+
+::
+
+    StatsProvider statsProvider = new CodahaleMetricsProvider();
+    DistributedLogConfiguration conf = ...;
+
+    // starts the stats provider (optional)
+    statsProvider.start(conf);
+
+    // all the dl related stats are exposed under "dlog"
+    StatsLogger statsLogger = statsProvider.getStatsLogger("dlog");
+    DistributedLogNamespace namespace = 
DistributedLogNamespaceBuilder.newBuilder()
+        .uri(...)
+        .conf(conf)
+        .statsLogger(statsLogger)
+        .build();
+
+    ...
+
+    // stop the stats provider (optional)
+    statsProvider.stop();
+
+
+Expose the stats collected by the stats provider in different ways by 
configuring following settings.
+Check Codehale_ on how to configuring report endpoints.
+
+::
+
+    // How frequent report the stats
+    codahaleStatsOutputFrequencySeconds=...
+    // The prefix string of codahale stats
+    codahaleStatsPrefix=...
+
+    //
+    // Report Endpoints
+    //
+
+    // expose the stats to Graphite
+    codahaleStatsGraphiteEndpoint=...
+    // expose the stats to CSV files
+    codahaleStatsCSVEndpoint=...
+    // expose the stats to Slf4j logging
+    codahaleStatsSlf4jEndpoint=...
+    // expose the stats to JMX endpoint
+    codahaleStatsJmxEndpoint=...
+
+
+check Codehale_ for more details.
+
+.. _Codehale: https://dropwizard.github.io/metrics/3.1.0/
+
+Enable Stats Provider on Bookie Servers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The stats provider used by *Bookie Servers* is configured by setting the 
following option.
+
+::
+
+    // class of stats provider
+    statsProviderClass="org.apache.bookkeeper.stats.CodahaleMetricsProvider"
+
+Metrics
+~~~~~~~
+
+Check the Metrics_ reference page for the metrics exposed by DistributedLog.
+
+.. _Metrics: ../user_guide/references/metrics

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/operations.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/operations.rst 
b/website/docs/0.4.0-incubating/admin_guide/operations.rst
new file mode 100644
index 0000000..e56c596
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/operations.rst
@@ -0,0 +1,224 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: admin-guide
+top-nav-pos: 1
+top-nav-title: Operations
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-parent: admin-guide
+sub-nav-id: operations
+sub-nav-pos: 1
+sub-nav-title: Operations
+---
+
+.. contents:: DistributedLog Operations
+
+DistributedLog Operations
+=========================
+
+Feature Provider
+~~~~~~~~~~~~~~~~
+
+DistributedLog uses a `feature-provider` library provided by Apache BookKeeper 
for managing features
+dynamically at runtime. It is a feature-flag_ system used to proportionally 
control what features
+are enabled for the system. In other words, it is a way of altering the 
control in a system without
+restarting it. It can be used during all stages of development, its most 
visible use case is on
+production. For instance, during a production release, you can enable or 
disable individual features,
+control the data flow through the system, thereby minimizing risk of system 
failure in real time.
+
+.. _feature-flag: https://en.wikipedia.org/wiki/Feature_toggle
+
+This `feature-provider` interface is pluggable and easy to integrate with any 
configuration management
+system.
+
+API
+___
+
+`FeatureProvider` is a provider that manages features under different scopes. 
The provider is responsible
+for loading features dynamically at runtime. A `Feature` is a numeric flag 
that control how much percentage
+of this feature will be available to the system - the number is called 
`availability`.
+
+::
+
+    Feature.name() => returns the name of this feature
+    Feature.availability() => returns the availability of this feature
+    Feature.isAvailable() => returns true if its availability is larger than 
0; otherwise false
+
+
+It is easy to obtain a feature from the provider by just providing a feature 
name.
+
+::
+
+    FeatureProvider provider = ...;
+    Feature feature = provider.getFeature("feature1"); // returns the feature 
named 'feature1'
+
+    
+The `FeatureProvider` is scopable to allow creating features in a hierarchical 
way. For example, if a system
+is comprised of two subsystems, one is *cache*, while the other one is 
*storage*. so the features belong to
+different subsystems can be created under different scopes.
+
+::
+
+    FeatureProvider provider = ...;
+    FeatureProvider cacheFeatureProvider = provider.scope("cache");
+    FeatureProvider storageFeatureProvider = provider.scope("storage");
+    Feature writeThroughFeature = 
cacheFeatureProvider.getFeature("write_through");
+    Feature duralWriteFeature = 
storageFeatureProvider.getFeature("dural_write");
+
+    // so the available features under `provider` are: (assume scopes are 
separated by '.')
+    // - 'cache.write_through'
+    // - 'storage.dural_write'
+
+
+The feature provider could be passed to `DistributedLogNamespaceBuilder` when 
building the namespace,
+thereby it would be used for controlling the features exposed under 
`DistributedLogNamespace`.
+
+::
+
+    FeatureProvider rootProvider = ...;
+    FeatureProvider dlFeatureProvider = rootProvider.scope("dlog");
+    DistributedLogNamespace namespace = 
DistributedLogNamespaceBuilder.newBuilder()
+        .uri(uri)
+        .conf(conf)
+        .featureProvider(dlFeatureProvider)
+        .build();
+
+
+The feature provider is loaded by reflection on distributedlog write proxy 
server. You could specify
+the feature provider class name as below. Otherwise it would use 
`DefaultFeatureProvider`, which disables
+all the features by default.
+
+::
+
+    
featureProviderClass=org.apache.distributedlog.feature.DynamicConfigurationFeatureProvider
+
+
+
+Configuration Based Feature Provider
+____________________________________
+
+Beside `DefaultFeatureProvider`, distributedlog also provides a file-based 
feature provider - it loads
+the features from properties files.
+
+All the features and their availabilities are configured in properties file 
format. For example,
+
+::
+
+    cache.write_through=100
+    storage.dural_write=0
+
+
+You could configure `featureProviderClass` in distributedlog configuration 
file by setting it to
+`org.apache.distributedlog.feature.DynamicConfigurationFeatureProvider` to 
enable file-based feature
+provider. The feature provider will load the features from two files, one is 
base config file configured
+by `fileFeatureProviderBaseConfigPath`, while the other one is overlay config 
file configured by
+`fileFeatureProviderOverlayConfigPath`. Current implementation doesn't 
differentiate these two files
+too much other than the `overlay` config will override the settings in `base` 
config. It is recommended
+to have a base config file for storing the default availability values for 
your system and dynamically
+adjust the availability values in overlay config file.
+
+::
+
+    
featureProviderClass=org.apache.distributedlog.feature.DynamicConfigurationFeatureProvider
+    fileFeatureProviderBaseConfigPath=/path/to/base/config
+    fileFeatureProviderOverlayConfigPath=/path/to/overlay/config
+    // how frequent we reload the config files
+    dynamicConfigReloadIntervalSec=60
+
+
+Available Features
+__________________
+
+Check the Features_ reference page for the features exposed by DistributedLog.
+
+.. _Features: ../user_guide/references/features
+
+`dlog`
+~~~~~~
+
+A CLI is provided for inspecting DistributedLog streams and metadata.
+
+.. code:: bash
+
+   dlog
+   JMX enabled by default
+   Usage: dlog <command>
+   where command is one of:
+       local               Run distributedlog sandbox
+       example             Run distributedlog example
+       tool                Run distributedlog tool
+       proxy_tool          Run distributedlog proxy tool to interact with 
proxies
+       balancer            Run distributedlog balancer
+       admin               Run distributedlog admin tool
+       help                This help message
+
+   or command is the full name of a class with a defined main() method.
+
+   Environment variables:
+       DLOG_LOG_CONF        Log4j configuration file (default 
$HOME/src/distributedlog/distributedlog-service/conf/log4j.properties)
+       DLOG_EXTRA_OPTS      Extra options to be passed to the jvm
+       DLOG_EXTRA_CLASSPATH Add extra paths to the dlog classpath
+
+These variable can also be set in conf/dlogenv.sh
+
+Create a stream
+_______________
+
+To create a stream:
+
+.. code:: bash
+
+   dlog tool create -u <DL URI> -r <STREAM PREFIX> -e <STREAM EXPRESSION>
+
+
+List the streams
+________________
+
+To list all the streams under a given DistributedLog namespace:
+
+.. code:: bash
+
+   dlog tool list -u <DL URI>
+
+Show stream's information
+_________________________
+
+To view the metadata associated with a stream:
+
+.. code:: bash
+
+   dlog tool show -u <DL URI> -s <STREAM NAME>
+
+
+Dump a stream
+_____________
+
+To dump the items inside a stream:
+
+.. code:: bash
+
+   dlog tool dump -u <DL URI> -s <STREAM NAME> -o <START TXN ID> -l <NUM 
RECORDS>
+
+Delete a stream
+_______________
+
+To delete a stream, run:
+
+.. code:: bash
+
+   dlog tool delete -u <DL URI> -s <STREAM NAME>
+
+
+Truncate a stream
+_________________
+
+Truncate the streams under a given DistributedLog namespace. You could specify 
a filter to match the streams that you want to truncate.
+
+There is a difference between the ``truncate`` and ``delete`` command. When 
you issue a ``truncate``, the data will be purge without removing the streams. 
A ``delete`` will delete the stream. You can pass the flag ``-delete`` to the 
``truncate`` command to also delete the streams.
+
+.. code:: bash
+
+   dlog tool truncate -u <DL URI>

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/performance.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/performance.rst 
b/website/docs/0.4.0-incubating/admin_guide/performance.rst
new file mode 100644
index 0000000..fa36660
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/performance.rst
@@ -0,0 +1,22 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: admin-guide
+top-nav-pos: 2
+top-nav-title: Performance Tuning
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-parent: admin-guide
+sub-nav-id: performance
+sub-nav-pos: 2
+sub-nav-title: Performance Tuning
+---
+
+.. contents:: Performance Tuning
+
+Performance Tuning
+==================
+
+(describe how to tune performance, critical settings)

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/vagrant.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/vagrant.rst 
b/website/docs/0.4.0-incubating/admin_guide/vagrant.rst
new file mode 100644
index 0000000..e4ed574
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/vagrant.rst
@@ -0,0 +1,18 @@
+Sample Deployment using vagrant
+================================
+
+This file explains vagrant deployment.
+
+Prerequesites
+--------------
+1. Vagrant: From https://www.vagrantup.com/downloads.html
+2. vagrant-hostmanager plugin: From 
https://github.com/devopsgroup-io/vagrant-hostmanager
+
+Steps
+-----
+1. Create a snapshot using ./scripts/snapshot
+2. Run vagrant up from the root directory of the enlistment
+3. Vagrant brings up a zookeepers with machine names: zk1,zk2, .... with IP 
addresses 192.168.50.11,192.168.50.12,....
+4. Vagrant brings the bookies with machine names: bk1,bk2, ... with IP 
addresses 192.168.50.51,192.168.50.52,....
+5. The script will also start writeproxies at 
distributedlog://$PUBLIC_ZOOKEEPER_ADDRESSES/messaging/distributedlog/mynamespace
 as the namespace. If you want it to point to a different namespace/port/shard, 
please update the ./vagrant/bk.sh script.
+6. If you want to run the client on the host machine, please add these node 
names and their IP addresses to the /etc/hosts file on the host.

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/admin_guide/zookeeper.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/admin_guide/zookeeper.rst 
b/website/docs/0.4.0-incubating/admin_guide/zookeeper.rst
new file mode 100644
index 0000000..0fee9de
--- /dev/null
+++ b/website/docs/0.4.0-incubating/admin_guide/zookeeper.rst
@@ -0,0 +1,104 @@
+---
+layout: default
+
+# Top navigation
+top-nav-group: admin-guide
+top-nav-pos: 5
+top-nav-title: ZooKeeper
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-parent: admin-guide
+sub-nav-id: zookeeper
+sub-nav-pos: 5
+sub-nav-title: ZooKeeper
+---
+
+ZooKeeper
+=========
+
+To run a DistributedLog ensemble, you'll need a set of Zookeeper
+nodes. There is no constraints on the number of Zookeeper nodes you
+need. One node is enough to run your cluster, but for reliability
+purpose, you should run at least 3 nodes.
+
+Version
+-------
+
+DistributedLog leverages zookeepr `multi` operations for metadata updates.
+So the minimum version of zookeeper is 3.4.*. We recommend to run stable
+zookeeper version `3.4.8`.
+
+Run ZooKeeper from distributedlog source
+----------------------------------------
+
+Since `zookeeper` is one of the dependency of `distributedlog-service`. You 
could simply
+run `zookeeper` servers using same set of scripts provided in 
`distributedlog-service`.
+In the following sections, we will describe how to run zookeeper using the 
scripts provided
+in `distributedlog-service`.
+
+Build
++++++
+
+First of all, build DistributedLog:
+
+.. code-block:: bash
+
+    $ mvn clean install -DskipTests
+
+Configuration
++++++++++++++
+
+The configuration file `zookeeper.conf.template` under 
`distributedlog-service/conf` is a template of
+production configuration to run a zookeeper node. Most of the configuration 
settings are good for
+production usage. You might need to configure following settings according to 
your environment and
+hardware platform.
+
+Ensemble
+^^^^^^^^
+
+You need to configure the zookeeper servers form this ensemble as below:
+
+::
+    
+    server.1=127.0.0.1:2710:3710:participant;0.0.0.0:2181
+
+
+Please check zookeeper_ website for more configurations.
+
+Disks
+^^^^^
+
+You need to configure following settings according to the disk layout of your 
hardware.
+It is recommended to put `dataLogDir` under a separated disk from others for 
performance.
+
+::
+    
+    # the directory where the snapshot is stored.
+    dataDir=/tmp/data/zookeeper
+    
+    # where txlog  are written
+    dataLogDir=/tmp/data/zookeeper/txlog
+
+
+Run
++++
+
+As `zookeeper` is shipped as part of `distributedlog-service`, you could use 
the `dlog-daemon.sh`
+script to start `zookeeper` as daemon thread.
+
+Start the zookeeper:
+
+.. code-block:: bash
+
+    $ ./distributedlog-service/bin/dlog-daemon.sh start zookeeper 
/path/to/zookeeper.conf
+
+Stop the zookeeper:
+
+.. code-block:: bash
+
+    $ ./distributedlog-service/bin/dlog-daemon.sh stop zookeeper
+
+Please check zookeeper_ website for more details.
+
+.. _zookeeper: http://zookeeper.apache.org/

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/basics/introduction.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/basics/introduction.rst 
b/website/docs/0.4.0-incubating/basics/introduction.rst
new file mode 100644
index 0000000..13ee120
--- /dev/null
+++ b/website/docs/0.4.0-incubating/basics/introduction.rst
@@ -0,0 +1,138 @@
+---
+# Top navigation
+top-nav-group: user-guide
+top-nav-pos: 1
+top-nav-title: Introduction
+
+# Sub-level navigation
+sub-nav-group: user-guide
+sub-nav-parent: user-guide
+sub-nav-id: introduction
+sub-nav-pos: 1
+sub-nav-title: Introduction
+
+layout: default
+---
+
+.. contents:: DistributedLog Overview
+
+Introduction
+============
+
+DistributedLog (DL) is a high performance replicated log service.
+It offers durability, replication and strong consistency, which provides a 
fundamental building block
+for building reliable distributed systems, e.g replicated-state-machines, 
general pub/sub systems,
+distributed databases, distributed queues and etc.
+
+DistributedLog maintains sequences of records in categories called *Logs* (aka 
*Log Streams*).
+The processes that write records to a DL log are *writers*, while the 
processes that read
+from logs and process the records are *readers*.
+
+
+.. figure:: ../images/softwarestack.png
+   :align: center
+
+   Figure 1. DistributedLog Software Stack
+
+Logs
+----
+
+A **log** is an ordered, immutable sequence of *log records*.
+
+.. figure:: ../images/datamodel.png
+   :align: center
+
+   Figure 2. Anatomy of a log stream
+
+Log Records
+~~~~~~~~~~~
+
+Each **log record** is a sequence of bytes.
+**Log records** are written sequentially into a *log stream*, and will be 
assigned with
+a unique sequence number *called* **DLSN** (DistributedLog Sequence Number). 
Besides *DLSN*,
+applications could assign its own sequence number while constructing log 
records. The
+application defined sequence number is called **TransactionID** (*txid*). 
Either *DLSN*
+or *TransactionID* could be used for positioning readers to start reading from 
a specific
+*log record*.
+
+Log Segments
+~~~~~~~~~~~~
+
+A **log** is broken down into *segments*, which each log segment contains its 
subset of
+records. **Log segments** are distributed and stored in a log segment store 
(e.g Apache BookKeeper).
+DistributedLog rolls the log segments based on configured rolling policy - 
either a configurable
+period of time (e.g. every 2 hours) or a configurable maximum size (e.g. every 
128MB).
+So the data of logs will be divided into equal-sized *log segments* and 
distributed evenly
+across log segment storage nodes. It allows the log to scale beyond a size 
that will fit on
+a single server and also spread read traffic among the cluster.
+
+The data of logs will either be kept forever until application *explicitly* 
truncates or be retained
+for a configurable period of time. **Explicit Truncation** is useful for 
building replicated
+state machines such as distributed databases. They usually require strong 
controls over when
+the data could be truncated. **Time-based Retention** is useful for real-time 
analytics. They only
+care about the data within a period of time.
+
+Namespaces
+~~~~~~~~~~
+
+The *log streams* belong to same organization are usually categorized and 
managed under
+a **namespace**. A DL **namespace** is basically for applications to locate 
where the
+*log streams* are. Applications could *create* and *delete* streams under a 
namespace,
+and also be able to *truncate* a stream to given sequence number (either 
*DLSN* or *TransactionID*).
+
+Writers
+-------
+
+Writers write data into the logs of their choice. All the records are
+appended into the logs in order. The sequencing is done by the writer,
+which means there is only one active writer for a log at a given time.
+DL guarantees correctness when two writers attempt writing to
+to a same log when network partition happens - via fencing mechanism
+in log segment store.
+
+The log writers are served and managed in a service tier called *Write Proxy*.
+The *Write Proxy* is used for accepting fan-in writes from large number
+of clients. Details on **Fan-in and Fan-out** can be found further into this 
doc.
+
+Readers
+-------
+
+Readers read records from the logs of their choice, starting from a provided
+position. The provided position could be either *DLSN* or *TransactionID*.
+The readers will read records in strict order from the logs. Different readers
+could read records starting from different positions in a same log.
+
+Unlike other pub/sub systems, DistributedLog doesn't record/manage readers' 
positions.
+It leaves the tracking responsibility to applications, as different 
applications
+might have different requirements on tracking and coordinating positions. It 
is hard
+to get it right with a single approach. For example, distributed databases 
might store
+the reader positions along with SSTables, so they would resume applying 
transactions
+from the positions stored in SSTables. Tracking reader positions could easily 
be done
+in application level using various stores (e.g. ZooKeeper, file system, or 
key/value stores).
+
+The log records could be cached in a service tier called *Read Proxy*, to serve
+a large number of readers.
+
+Fan-in and Fan-out
+------------------
+
+The core of DistributedLog supports single-writer, multiple-readers semantics. 
The service layer
+built on top of the *DistributedLog Core* to support large scale of number of 
writers and readers.
+The service layer includes **Write Proxy** and **Read Proxy**. **Write Proxy** 
manages
+the writers of logs and fail over them when machines are failed. It allows 
supporting
+which don't care about the log ownership by aggregating writes from many 
sources (aka *Fan-in*).
+**Read Proxy** optimize reader path by caching log records in cases where 
hundreds or
+thousands of readers are consuming a same log stream.
+
+Guarantees
+----------
+
+At a high level, DistributedLog gives the following guarantees:
+
+* Records written by a writer to a log will be appended in the order they are 
written. That is, if a record *R1* is written by same writer as a record *R2*, 
*R1* will have a smaller sequence number than *R2*.
+* Readers see records in the same order they were written to the log.
+* All records are persisted on disk before acknowledgments, to gurantee 
durability.
+* For a log with replication factor of N, DistributedLog tolerates up to N-1 
server failures without losing any records.
+
+More details on these guarantees are given in the [design 
section](http://distributedlog.incubator.apache.org/docs/0.4.0-incubating/user_guide/design/main.html).
+

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/css/main.scss
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/css/main.scss 
b/website/docs/0.4.0-incubating/css/main.scss
new file mode 100644
index 0000000..f2e566e
--- /dev/null
+++ b/website/docs/0.4.0-incubating/css/main.scss
@@ -0,0 +1,53 @@
+---
+# Only the main Sass file needs front matter (the dashes are enough)
+---
+@charset "utf-8";
+
+
+
+// Our variables
+$base-font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
+$base-font-size:   16px;
+$base-font-weight: 400;
+$small-font-size:  $base-font-size * 0.875;
+$base-line-height: 1.5;
+
+$spacing-unit:     30px;
+
+$text-color:       #111;
+$background-color: #fdfdfd;
+$brand-color:      #2a7ae2;
+
+$grey-color:       #828282;
+$grey-color-light: lighten($grey-color, 40%);
+$grey-color-dark:  darken($grey-color, 25%);
+
+// Width of the content area
+$content-width:    800px;
+
+$on-palm:          600px;
+$on-laptop:        800px;
+
+
+
+// Use media queries like this:
+// @include media-query($on-palm) {
+//     .wrapper {
+//         padding-right: $spacing-unit / 2;
+//         padding-left: $spacing-unit / 2;
+//     }
+// }
+@mixin media-query($device) {
+    @media screen and (max-width: $device) {
+        @content;
+    }
+}
+
+
+
+// Import partials from `sass_dir` (defaults to `_sass`)
+@import
+        "base",
+        "layout",
+        "syntax-highlighting"
+;

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/css/theme.css
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/css/theme.css 
b/website/docs/0.4.0-incubating/css/theme.css
new file mode 100644
index 0000000..c0dd5a4
--- /dev/null
+++ b/website/docs/0.4.0-incubating/css/theme.css
@@ -0,0 +1,21 @@
+body {
+  padding-top: 70px;
+  padding-bottom: 30px;
+  font-family: 'Roboto', sans-serif;
+}
+
+.theme-dropdown .dropdown-menu {
+  position: static;
+  display: block;
+  margin-bottom: 20px;
+}
+
+.theme-showcase > p > .btn {
+  margin: 5px 0;
+}
+
+.theme-showcase .navbar .container {
+  width: auto;
+}
+
+@import url(https://fonts.googleapis.com/css?family=Roboto:400,300);

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/deployment/cluster.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/deployment/cluster.rst 
b/website/docs/0.4.0-incubating/deployment/cluster.rst
new file mode 100644
index 0000000..b4caf3f
--- /dev/null
+++ b/website/docs/0.4.0-incubating/deployment/cluster.rst
@@ -0,0 +1,561 @@
+---
+title: Cluster Setup
+
+top-nav-group: deployment
+top-nav-pos: 1
+top-nav-title: Cluster Setup
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-id: cluster-deployment
+sub-nav-parent: admin-guide
+sub-nav-group-title: Cluster Setup
+sub-nav-pos: 1
+sub-nav-title: Cluster Setup
+
+layout: default
+---
+
+.. contents:: This page provides instructions on how to run **DistributedLog** 
in a fully distributed fashion.
+
+Cluster Setup & Deployment
+==========================
+
+This section describes how to run DistributedLog in `distributed` mode.
+To run a cluster with DistributedLog, you need a Zookeeper cluster and a 
Bookkeeper cluster.
+
+Build
+-----
+
+To build DistributedLog, run:
+
+.. code-block:: bash
+
+   mvn clean install -DskipTests
+
+
+Or run `./scripts/snapshot` to build the release packages from current source. 
The released
+packages contain the binaries for running `distributedlog-service`, 
`distributedlog-benchmark`
+and `distributedlog-tutorials`.
+
+NOTE: we run the following instructions from distributedlog source code after
+running `mvn clean install`.  And assume `DL_HOME` is the directory of
+distributedlog source.
+
+Zookeeper
+---------
+
+(If you already have a zookeeper cluster running, you could skip this section.)
+
+We could use the `dlog-daemon.sh` and the `zookeeper.conf.template` to 
demonstrate run a 1-node
+zookeeper ensemble locally.
+
+Create a `zookeeper.conf` from the `zookeeper.conf.template`.
+
+.. code-block:: bash
+
+    $ cp distributedlog-service/conf/zookeeper.conf.template 
distributedlog-service/conf/zookeeper.conf
+
+Configure the settings in `zookeeper.conf`. By default, it will use 
`/tmp/data/zookeeper` for storing
+the zookeeper data. Let's create the data directories for zookeeper.
+
+.. code-block:: bash
+
+    $ mkdir -p /tmp/data/zookeeper/txlog
+
+Once the data directory is created, we need to assign `myid` for this 
zookeeper node.
+
+.. code-block:: bash
+
+    $ echo "1" > /tmp/data/zookeeper/myid
+
+Start the zookeeper daemon using `dlog-daemon.sh`.
+
+.. code-block:: bash
+
+    $ ./distributedlog-service/bin/dlog-daemon.sh start zookeeper 
${DL_HOME}/distributedlog-service/conf/zookeeper.conf
+
+You could verify the zookeeper setup using `zkshell`.
+
+.. code-block:: bash
+
+    // ./distributedlog-service/bin/dlog zkshell ${zkservers}
+    $ ./distributedlog-service/bin/dlog zkshell localhost:2181
+    Connecting to localhost:2181
+    Welcome to ZooKeeper!
+    JLine support is enabled
+
+    WATCHER::
+
+    WatchedEvent state:SyncConnected type:None path:null
+    [zk: localhost:2181(CONNECTED) 0] ls /
+    [zookeeper]
+    [zk: localhost:2181(CONNECTED) 1]
+
+Please refer to the `ZooKeeper Guide`_ for more details on setting up 
zookeeper cluster.
+
+.. _ZooKeeper Guide: ../admin_guide/zookeeper
+
+Bookkeeper
+----------
+
+(If you already have a bookkeeper cluster running, you could skip this 
section.)
+
+We could use the `dlog-daemon.sh` and the `bookie.conf.template` to 
demonstrate run a 3-nodes
+bookkeeper cluster locally.
+
+Create a `bookie.conf` from the `bookie.conf.template`. Since we are going to 
run a 3-nodes
+bookkeeper cluster locally. Let's make three copies of `bookie.conf.template`.
+
+.. code-block:: bash
+
+    $ cp distributedlog-service/conf/bookie.conf.template 
distributedlog-service/conf/bookie-1.conf
+    $ cp distributedlog-service/conf/bookie.conf.template 
distributedlog-service/conf/bookie-2.conf
+    $ cp distributedlog-service/conf/bookie.conf.template 
distributedlog-service/conf/bookie-3.conf
+
+Configure the settings in the bookie configuraiont files.
+
+First of all, choose the zookeeper cluster that the bookies will use and set 
`zkServers` in
+the configuration files.
+
+::
+
+    zkServers=localhost:2181
+
+
+Choose the zookeeper path to store bookkeeper metadata and set 
`zkLedgersRootPath` in the configuration
+files. Let's use `/messaging/bookkeeper/ledgers` in this instruction.
+
+::
+
+    zkLedgersRootPath=/messaging/bookkeeper/ledgers
+
+
+Format bookkeeper metadata
+++++++++++++++++++++++++++
+
+(NOTE: only format bookkeeper metadata when first time setting up the 
bookkeeper cluster.)
+
+The bookkeeper shell doesn't automatically create the `zkLedgersRootPath` when 
running `metaformat`.
+So using `zkshell` to create the `zkLedgersRootPath`.
+
+::
+
+    $ ./distributedlog-service/bin/dlog zkshell localhost:2181
+    Connecting to localhost:2181
+    Welcome to ZooKeeper!
+    JLine support is enabled
+
+    WATCHER::
+
+    WatchedEvent state:SyncConnected type:None path:null
+    [zk: localhost:2181(CONNECTED) 0] create /messaging ''
+    Created /messaging
+    [zk: localhost:2181(CONNECTED) 1] create /messaging/bookkeeper ''
+    Created /messaging/bookkeeper
+    [zk: localhost:2181(CONNECTED) 2] create /messaging/bookkeeper/ledgers ''
+    Created /messaging/bookkeeper/ledgers
+    [zk: localhost:2181(CONNECTED) 3]
+
+
+If the `zkLedgersRootPath`, run `metaformat` to format the bookkeeper metadata.
+
+::
+
+    $ BOOKIE_CONF=${DL_HOME}/distributedlog-service/conf/bookie-1.conf 
./distributedlog-service/bin/dlog bkshell metaformat
+    Are you sure to format bookkeeper metadata ? (Y or N) Y
+
+Add Bookies
++++++++++++
+
+Once the bookkeeper metadata is formatted, it is ready to add bookie nodes to 
the cluster.
+
+Configure Ports
+^^^^^^^^^^^^^^^
+
+Configure the ports that used by bookies.
+
+bookie-1:
+
+::
+
+    # Port that bookie server listen on
+    bookiePort=3181
+    # Exporting codahale stats
+    185 codahaleStatsHttpPort=9001
+
+bookie-2:
+
+::
+
+    # Port that bookie server listen on
+    bookiePort=3182
+    # Exporting codahale stats
+    185 codahaleStatsHttpPort=9002
+
+bookie-3:
+
+::
+
+    # Port that bookie server listen on
+    bookiePort=3183
+    # Exporting codahale stats
+    185 codahaleStatsHttpPort=9003
+
+Configure Disk Layout
+^^^^^^^^^^^^^^^^^^^^^
+
+Configure the disk directories used by a bookie server by setting following 
options.
+
+::
+
+    # Directory Bookkeeper outputs its write ahead log
+    journalDirectory=/tmp/data/bk/journal
+    # Directory Bookkeeper outputs ledger snapshots
+    ledgerDirectories=/tmp/data/bk/ledgers
+    # Directory in which index files will be stored.
+    indexDirectories=/tmp/data/bk/ledgers
+
+As we are configuring a 3-nodes bookkeeper cluster, we modify the following 
settings as below:
+
+bookie-1:
+
+::
+
+    # Directory Bookkeeper outputs its write ahead log
+    journalDirectory=/tmp/data/bk-1/journal
+    # Directory Bookkeeper outputs ledger snapshots
+    ledgerDirectories=/tmp/data/bk-1/ledgers
+    # Directory in which index files will be stored.
+    indexDirectories=/tmp/data/bk-1/ledgers
+
+bookie-2:
+
+::
+
+    # Directory Bookkeeper outputs its write ahead log
+    journalDirectory=/tmp/data/bk-2/journal
+    # Directory Bookkeeper outputs ledger snapshots
+    ledgerDirectories=/tmp/data/bk-2/ledgers
+    # Directory in which index files will be stored.
+    indexDirectories=/tmp/data/bk-2/ledgers
+
+bookie-3:
+
+::
+
+    # Directory Bookkeeper outputs its write ahead log
+    journalDirectory=/tmp/data/bk-3/journal
+    # Directory Bookkeeper outputs ledger snapshots
+    ledgerDirectories=/tmp/data/bk-3/ledgers
+    # Directory in which index files will be stored.
+    indexDirectories=/tmp/data/bk-3/ledgers
+
+Format bookie
+^^^^^^^^^^^^^
+
+Once the disk directories are configured correctly in the configuration file, 
use
+`bkshell bookieformat` to format the bookie.
+
+::
+
+    BOOKIE_CONF=${DL_HOME}/distributedlog-service/conf/bookie-1.conf 
./distributedlog-service/bin/dlog bkshell bookieformat
+    BOOKIE_CONF=${DL_HOME}/distributedlog-service/conf/bookie-2.conf 
./distributedlog-service/bin/dlog bkshell bookieformat
+    BOOKIE_CONF=${DL_HOME}/distributedlog-service/conf/bookie-3.conf 
./distributedlog-service/bin/dlog bkshell bookieformat
+
+
+Start bookie
+^^^^^^^^^^^^
+
+Start the bookie using `dlog-daemon.sh`.
+
+::
+
+    SERVICE_PORT=3181 ./distributedlog-service/bin/dlog-daemon.sh start bookie 
--conf ${DL_HOME}/distributedlog-service/conf/bookie-1.conf
+    SERVICE_PORT=3182 ./distributedlog-service/bin/dlog-daemon.sh start bookie 
--conf ${DL_HOME}/distributedlog-service/conf/bookie-2.conf
+    SERVICE_PORT=3183 ./distributedlog-service/bin/dlog-daemon.sh start bookie 
--conf ${DL_HOME}/distributedlog-service/conf/bookie-3.conf
+
+Verify whether the bookie is setup correctly. You could simply check whether 
the bookie is showed up in
+zookeeper `zkLedgersRootPath`/available znode.
+
+::
+
+    $ ./distributedlog-service/bin/dlog zkshell localhost:2181
+    Connecting to localhost:2181
+    Welcome to ZooKeeper!
+    JLine support is enabled
+
+    WATCHER::
+
+    WatchedEvent state:SyncConnected type:None path:null
+    [zk: localhost:2181(CONNECTED) 0] ls 
/messaging/bookkeeper/ledgers/available
+    [127.0.0.1:3181, 127.0.0.1:3182, 127.0.0.1:3183, readonly]
+    [zk: localhost:2181(CONNECTED) 1]
+
+
+Or check if the bookie is exposing the stats at port `codahaleStatsHttpPort`.
+
+::
+
+    // ping the service
+    $ curl localhost:9001/ping
+    pong
+    // checking the stats
+    curl localhost:9001/metrics?pretty=true
+
+Stop bookie
+^^^^^^^^^^^
+
+Stop the bookie using `dlog-daemon.sh`.
+
+::
+
+    $ ./distributedlog-service/bin/dlog-daemon.sh stop bookie
+    // Example:
+    $ SERVICE_PORT=3181 ./distributedlog-service/bin/dlog-daemon.sh stop bookie
+    doing stop bookie ...
+    stopping bookie
+    Shutdown is in progress... Please wait...
+    Shutdown completed.
+
+Turn bookie to readonly
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Start the bookie in `readonly` mode.
+
+::
+
+    $ SERVICE_PORT=3181 ./distributedlog-service/bin/dlog-daemon.sh start 
bookie --conf ${DL_HOME}/distributedlog-service/conf/bookie-1.conf --readonly
+
+Verify if the bookie is running in `readonly` mode.
+
+::
+
+    $ ./distributedlog-service/bin/dlog zkshell localhost:2181
+    Connecting to localhost:2181
+    Welcome to ZooKeeper!
+    JLine support is enabled
+
+    WATCHER::
+
+    WatchedEvent state:SyncConnected type:None path:null
+    [zk: localhost:2181(CONNECTED) 0] ls 
/messaging/bookkeeper/ledgers/available
+    [127.0.0.1:3182, 127.0.0.1:3183, readonly]
+    [zk: localhost:2181(CONNECTED) 1] ls 
/messaging/bookkeeper/ledgers/available/readonly
+    [127.0.0.1:3181]
+    [zk: localhost:2181(CONNECTED) 2]
+
+Please refer to the `BookKeeper Guide`_ for more details on setting up 
bookkeeper cluster.
+
+.. _BookKeeper Guide: ../admin_guide/bookkeeper
+
+Create Namespace
+----------------
+
+After setting up a zookeeper cluster and a bookkeeper cluster, you could 
provision DL namespaces
+for applications to use.
+
+Provisioning a DistributedLog namespace is accomplished via the `bind` command 
available in `dlog tool`.
+
+Namespace is bound by writing bookkeeper environment settings (e.g. the ledger 
path, bkLedgersZkPath,
+or the set of Zookeeper servers used by bookkeeper, bkZkServers) as metadata 
in the zookeeper path of
+the namespace DL URI. The DL library resolves the DL URI to determine which 
bookkeeper cluster it
+should read and write to. 
+
+The namespace binding has following features:
+
+- `Inheritance`: suppose 
`distributedlog://<zkservers>/messaging/distributedlog` is bound to bookkeeper
+  cluster `X`. All the streams created under 
`distributedlog://<zkservers>/messaging/distributedlog`,
+  will write to bookkeeper cluster `X`.
+- `Override`: suppose `distributedlog://<zkservers>/messaging/distributedlog` 
is bound to bookkeeper
+  cluster `X`. You want streams under 
`distributedlog://<zkservers>/messaging/distributedlog/S` write
+  to bookkeeper cluster `Y`. You could just bind 
`distributedlog://<zkservers>/messaging/distributedlog/S`
+  to bookkeeper cluster `Y`. The binding to 
`distributedlog://<zkservers>/messaging/distributedlog/S`
+  only affects streams under 
`distributedlog://<zkservers>/messaging/distributedlog/S`.
+
+Create namespace binding using `dlog tool`. For example, we create a namespace
+`distributedlog://127.0.0.1:2181/messaging/distributedlog/mynamespace` 
pointing to the
+bookkeeper cluster we just created above.
+
+::
+
+    $ distributedlog-service/bin/dlog admin bind \\
+        -dlzr 127.0.0.1:2181 \\
+        -dlzw 127.0.0.1:2181 \\
+        -s 127.0.0.1:2181 \\
+        -bkzr 127.0.0.1:2181 \\
+        -l /messaging/bookkeeper/ledgers \\
+        -i false \\
+        -r true \\
+        -c \\
+        distributedlog://127.0.0.1:2181/messaging/distributedlog/mynamespace
+
+    No bookkeeper is bound to 
distributedlog://127.0.0.1:2181/messaging/distributedlog/mynamespace
+    Created binding on 
distributedlog://127.0.0.1:2181/messaging/distributedlog/mynamespace.
+
+
+- Configure the zookeeper cluster used for storing DistributedLog metadata: 
`-dlzr` and `-dlzw`.
+  Ideally `-dlzr` and `-dlzw` would be same the zookeeper server in 
distributedlog namespace uri.
+  However to scale zookeeper reads, the zookeeper observers sometimes are 
added in a different
+  domain name than participants. In such case, configuring `-dlzr` and `-dlzw` 
to different
+  zookeeper domain names would help isolating zookeeper write and read traffic.
+- Configure the zookeeper cluster used by bookkeeper for storing the metadata 
: `-bkzr` and `-s`.
+  Similar as `-dlzr` and `-dlzw`, you could configure the namespace to use 
different zookeeper
+  domain names for readers and writers to access bookkeeper metadatadata.
+- Configure the bookkeeper ledgers path: `-l`.
+- Configure the zookeeper path to store DistributedLog metadata. It is 
implicitly included as part
+  of namespace URI.
+
+Write Proxy
+-----------
+
+A write proxy consists of multiple write proxies. They don't store any state 
locally. So they are
+mostly stateless and can be run as many as you can.
+
+Configuration
++++++++++++++
+
+Different from bookkeeper, DistributedLog tries not to configure any 
environment related settings
+in configuration files. Any environment related settings are stored and 
configured via `namespace binding`.
+The configuration file should contain non-environment related settings.
+
+There is a `write_proxy.conf` template file available under 
`distributedlog-service` module.
+
+Run write proxy
++++++++++++++++
+
+A write proxy could be started using `dlog-daemon.sh` script under 
`distributedlog-service`.
+
+::
+
+    WP_SHARD_ID=${WP_SHARD_ID} WP_SERVICE_PORT=${WP_SERVICE_PORT} 
WP_STATS_PORT=${WP_STATS_PORT} ./distributedlog-service/bin/dlog-daemon.sh 
start writeproxy
+
+- `WP_SHARD_ID`: A non-negative integer. You don't need to guarantee 
uniqueness of shard id, as it is just an
+  indicator to the client for routing the requests. If you are running the 
`write proxy` using a cluster scheduler
+  like `aurora`, you could easily obtain a shard id and use that to configure 
`WP_SHARD_ID`.
+- `WP_SERVICE_PORT`: The port that write proxy listens on.
+- `WP_STATS_PORT`: The port that write proxy exposes stats to a http endpoint.
+
+Please check `distributedlog-service/conf/dlogenv.sh` for more environment 
variables on configuring write proxy.
+
+- `WP_CONF_FILE`: The path to the write proxy configuration file.
+- `WP_NAMESPACE`: The distributedlog namespace that the write proxy is serving 
for.
+
+For example, we start 3 write proxies locally and point to the namespace 
created above.
+
+::
+
+    $ WP_SHARD_ID=1 WP_SERVICE_PORT=4181 WP_STATS_PORT=20001 
./distributedlog-service/bin/dlog-daemon.sh start writeproxy
+    $ WP_SHARD_ID=2 WP_SERVICE_PORT=4182 WP_STATS_PORT=20002 
./distributedlog-service/bin/dlog-daemon.sh start writeproxy
+    $ WP_SHARD_ID=3 WP_SERVICE_PORT=4183 WP_STATS_PORT=20003 
./distributedlog-service/bin/dlog-daemon.sh start writeproxy
+
+The write proxy will announce itself to the zookeeper path `.write_proxy` 
under the dl namespace path.
+
+We could verify that the write proxy is running correctly by checking the 
zookeeper path or checking its stats port.
+
+::
+
+    $ ./distributedlog-service/bin/dlog zkshell localhost:2181
+    Connecting to localhost:2181
+    Welcome to ZooKeeper!
+    JLine support is enabled
+
+    WATCHER::
+
+    WatchedEvent state:SyncConnected type:None path:null
+    [zk: localhost:2181(CONNECTED) 0] ls 
/messaging/distributedlog/mynamespace/.write_proxy
+    [member_0000000000, member_0000000001, member_0000000002]
+
+
+::
+
+    $ curl localhost:20001/ping
+    pong
+
+
+Add and Remove Write Proxies
+++++++++++++++++++++++++++++
+
+Removing a write proxy is pretty straightforward by just killing the process.
+
+::
+
+    WP_SHARD_ID=1 WP_SERVICE_PORT=4181 WP_STATS_PORT=10001 
./distributedlog-service/bin/dlog-daemon.sh stop writeproxy
+
+
+Adding a new write proxy is just adding a new host and starting the write proxy
+process as described above.
+
+Write Proxy Naming
+++++++++++++++++++
+
+The `dlog-daemon.sh` script starts the write proxy by announcing it to the 
`.write_proxy` path under
+the dl namespace. So you could use uri in the distributedlog client builder to 
access the write proxy cluster.
+
+Verify the setup
+++++++++++++++++
+
+You could verify the write proxy cluster by running tutorials over the setup 
cluster.
+
+Create 10 streams.
+
+::
+    
+    $ ./distributedlog-service/bin/dlog tool create -u 
distributedlog://127.0.0.1:2181/messaging/distributedlog/mynamespace -r stream- 
-e 0-10
+    You are going to create streams : [stream-0, stream-1, stream-2, stream-3, 
stream-4, stream-5, stream-6, stream-7, stream-8, stream-9, stream-10] (Y or N) 
Y
+
+
+Tail read from the 10 streams.
+
+::
+
+    $ ./distributedlog-tutorials/distributedlog-basic/bin/runner run 
org.apache.distributedlog.basic.MultiReader 
distributedlog://127.0.0.1:2181/messaging/distributedlog/mynamespace 
stream-0,stream-1,stream-2,stream-3,stream-4,stream-5,stream-6,stream-7,stream-8,stream-9,stream-10
+
+
+Run record generator over some streams
+
+::
+
+    $ ./distributedlog-tutorials/distributedlog-basic/bin/runner run 
org.apache.distributedlog.basic.RecordGenerator 
'zk!127.0.0.1:2181!/messaging/distributedlog/mynamespace/.write_proxy' stream-0 
100
+    $ ./distributedlog-tutorials/distributedlog-basic/bin/runner run 
org.apache.distributedlog.basic.RecordGenerator 
'zk!127.0.0.1:2181!/messaging/distributedlog/mynamespace/.write_proxy' stream-1 
100
+
+
+Check the terminal running `MultiReader`. You will see similar output as below:
+
+::
+
+    """
+    Received record DLSN{logSegmentSequenceNo=1, entryId=21044, slotId=0} from 
stream stream-0
+    """
+    record-1464085079105
+    """
+    Received record DLSN{logSegmentSequenceNo=1, entryId=21046, slotId=0} from 
stream stream-0
+    """
+    record-1464085079113
+    """
+    Received record DLSN{logSegmentSequenceNo=1, entryId=9636, slotId=0} from 
stream stream-1
+    """
+    record-1464085079110
+    """
+    Received record DLSN{logSegmentSequenceNo=1, entryId=21048, slotId=0} from 
stream stream-0
+    """
+    record-1464085079125
+    """
+    Received record DLSN{logSegmentSequenceNo=1, entryId=9638, slotId=0} from 
stream stream-1
+    """
+    record-1464085079121
+    """
+    Received record DLSN{logSegmentSequenceNo=1, entryId=21050, slotId=0} from 
stream stream-0
+    """
+    record-1464085079133
+    """
+    Received record DLSN{logSegmentSequenceNo=1, entryId=9640, slotId=0} from 
stream stream-1
+    """
+    record-1464085079130
+    """
+
+
+
+Please refer to the Performance_ page for more details on tuning performance.
+
+.. _Performance: ../admin_guide/performance

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/deployment/docker.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/deployment/docker.rst 
b/website/docs/0.4.0-incubating/deployment/docker.rst
new file mode 100644
index 0000000..d9fd87a
--- /dev/null
+++ b/website/docs/0.4.0-incubating/deployment/docker.rst
@@ -0,0 +1,49 @@
+---
+title: Docker
+top-nav-group: deployment
+top-nav-pos: 2
+top-nav-title: Docker
+layout: default
+---
+
+.. contents:: This page provides instructions on how to deploy 
**DistributedLog** using docker.
+
+Docker Setup
+============
+
+Prerequesites
+-------------
+1. Docker
+
+Steps
+-----
+1. Create a snapshot using
+
+.. code-block:: bash
+
+    ./scripts/snapshot
+
+
+2. Create your own docker image using
+
+.. code-block:: bash
+
+    docker build -t <your image name> .
+
+
+3. You can run the docker container using
+
+.. code-block:: bash
+
+    docker run -e ZK_SERVERS=<zk server list> -e DEPLOY_BK=<true|false> -e 
DEPLOY_WP=<true|false> <your image name>
+
+
+Environment variables
+---------------------
+
+Following are the environment variables which can change how the docker 
container runs.
+
+1. **ZK_SERVERS**: ZK servers running exernally (the container does not run a 
zookeeper)
+2. **DEPLOY_BOTH**: Deploys writeproxies as well as the bookies
+3. **DEPLOY_WP**: Flag to notify that a writeproxy needs to be deployed
+4. **DEPLOY_BK**: Flag to notify that a bookie needs to be deployed

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/deployment/global-cluster.rst
----------------------------------------------------------------------
diff --git a/website/docs/0.4.0-incubating/deployment/global-cluster.rst 
b/website/docs/0.4.0-incubating/deployment/global-cluster.rst
new file mode 100644
index 0000000..a8c1523
--- /dev/null
+++ b/website/docs/0.4.0-incubating/deployment/global-cluster.rst
@@ -0,0 +1,113 @@
+---
+title: Global Cluster Setup
+
+top-nav-group: deployment
+top-nav-pos: 1
+top-nav-title: Global Cluster Setup
+
+# Sub-level navigation
+sub-nav-group: admin-guide
+sub-nav-id: cluster-deployment
+sub-nav-parent: admin-guide
+sub-nav-group-title: Global Cluster Setup
+sub-nav-pos: 1
+sub-nav-title: Global Cluster Setup
+
+layout: default
+---
+
+.. contents:: This page provides instructions on how to run **DistributedLog** 
across multiple regions.
+
+
+Cluster Setup & Deployment
+==========================
+
+Setting up `globally replicated DistributedLog 
<../user_guide/globalreplicatedlog/main>`_ is very similar to setting up local 
DistributedLog.
+The most important change is use a ZooKeeper cluster configured across 
multiple-regions. Once set up, DistributedLog
+and BookKeeper are configured to use the global ZK cluster for all metadata 
storage, and the system will more or
+less work. The remaining steps are necessary to ensure things like durability 
in the face of total region failure.
+
+The key differences with standard cluster setup are summarized below:
+
+- The zookeeper cluster must be running across all of the target regions, say 
A, B, C.
+
+- Region aware placement policy and a few other options must be configured in 
DL config.
+
+- DistributedLog clients should be configured to talk to all regions.
+
+We elaborate on these steps in the following sections.
+
+
+Global Zookeeper
+----------------
+
+When defining your server and participant lists in zookeeper configuration, a 
sufficient number of nodes from each
+region must be included.
+
+Please consult the ZooKeeper documentation for detailed Zookeeper setup 
instructions.
+
+
+DistributedLog Configuration
+----------------------------
+
+In multi-region DistributedLog several DL config changes are needed.
+
+Placement Policy
+++++++++++++++++
+
+The region-aware placement policy must be configured. Below, it is configured 
to place replicas across 3 regions, A, B, and C.
+
+::
+
+    # placement policy
+    
bkc.ensemblePlacementPolicy=org.apache.bookkeeper.client.RegionAwareEnsemblePlacementPolicy
+    bkc.reppRegionsToWrite=A;B;C
+    bkc.reppMinimumRegionsForDurability=2
+    bkc.reppEnableDurabilityEnforcementInReplace=true
+    bkc.reppEnableValidation=true
+
+Connection Timeouts
++++++++++++++++++++
+
+In global replicated mode, the proxy nodes will be writing to the entire 
ensemble, which exists in multiple regions.
+If cross-region latency is higher than local region latency (i.e. if its truly 
cross-region) then it is advisable to
+use a higher BookKeeper client connection tiemout.
+
+::
+
+    # setting connect timeout to 1 second for global cluster
+    bkc.connectTimeoutMillis=1000
+
+Quorum Size
++++++++++++
+
+It is advisable to run with a larger ensemble to ensure cluster health in the 
event of region loss (the ensemble
+will be split across all regions).
+
+The values of these settings will depend on your operational and durability 
requirements.
+
+::
+
+    ensemble-size=9
+    write-quorum-size=9
+    ack-quorum-size=5
+
+Client Configuration
+--------------------
+
+Although not required, it is recommended to configure the write client to use 
all available regions. Several methods
+in DistributedLogClientBuilder can be used to achieve this.
+
+.. code-block:: java
+
+    DistributedLogClientBuilder.serverSets
+    DistributedLogClientBuilder.finagleNameStrs
+
+
+Additional Steps
+================
+
+Other clients settings may need to be tuned - for example in the write client, 
timeouts will likely need to be
+increased.
+
+Aside from this however, cluster setup is exactly the same as `single region 
setup <cluster>`_.

http://git-wip-us.apache.org/repos/asf/incubator-distributedlog/blob/3469fc87/website/docs/0.4.0-incubating/fonts/bootstrap/glyphicons-halflings-regular.eot
----------------------------------------------------------------------
diff --git 
a/website/docs/0.4.0-incubating/fonts/bootstrap/glyphicons-halflings-regular.eot
 
b/website/docs/0.4.0-incubating/fonts/bootstrap/glyphicons-halflings-regular.eot
new file mode 100755
index 0000000..b93a495
Binary files /dev/null and 
b/website/docs/0.4.0-incubating/fonts/bootstrap/glyphicons-halflings-regular.eot
 differ

[08/12] incubator-distributedlog git commit: Release 0.4.0-incubating

Reply via email to