[08/32] storm git commit: STORM-1617: Release Specific Documentation 0.9.x

bobby Tue, 22 Mar 2016 08:31:24 -0700

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/STORM-UI-REST-API.md
----------------------------------------------------------------------
diff --git a/docs/STORM-UI-REST-API.md b/docs/STORM-UI-REST-API.md
new file mode 100644
index 0000000..2109ab2
--- /dev/null
+++ b/docs/STORM-UI-REST-API.md
@@ -0,0 +1,678 @@
+---
+title: Storm UI REST API
+layout: documentation
+documentation: true
+---
+
+# Storm UI REST API
+
+The Storm UI daemon provides a REST API that allows you to interact with a 
Storm cluster, which includes retrieving
+metrics data and configuration information as well as management operations 
such as starting or stopping topologies.
+
+
+# Data format
+
+The REST API returns JSON responses and supports JSONP.
+Clients can pass a callback query parameter to wrap JSON in the callback 
function.
+
+
+# Using the UI REST API
+
+_Note: It is recommended to ignore undocumented elements in the JSON response 
because future versions of Storm may not_
+_support those elements anymore._
+
+
+## REST API Base URL
+
+The REST API is part of the UI daemon of Storm (started by `storm ui`) and 
thus runs on the same host and port as the
+Storm UI (the UI daemon is often run on the same host as the Nimbus daemon).  
The port is configured by `ui.port`,
+which is set to `8080` by default (see [defaults.yaml](conf/defaults.yaml)).
+
+The API base URL would thus be:
+
+    http://<ui-host>:<ui-port>/api/v1/...
+
+You can use a tool such as `curl` to talk to the REST API:
+
+    # Request the cluster configuration.
+    # Note: We assume ui.port is configured to the default value of 8080.
+    $ curl http://<ui-host>:8080/api/v1/cluster/configuration
+
+##Impersonating a user in secure environment
+In a secure environment an authenticated user can impersonate another user. To 
impersonate a user the caller must pass
+`doAsUser` param or header with value set to the user that the request needs 
to be performed as. Please see SECURITY.MD
+to learn more about how to setup impersonation ACLs and authorization. The 
rest API uses the same configs and acls that
+are used by nimbus.
+
+Examples:
+
+```no-highlight
+ 1. 
http://ui-daemon-host-name:8080/api/v1/topology/wordcount-1-1425844354\?doAsUser=testUSer1
+ 2. curl 
'http://localhost:8080/api/v1/topology/wordcount-1-1425844354/activate' -X POST 
-H 'doAsUser:testUSer1'
+```
+
+## GET Operations
+
+### /api/v1/cluster/configuration (GET)
+
+Returns the cluster configuration.
+
+Sample response (does not include all the data fields):
+
+```json
+  {
+    "dev.zookeeper.path": "/tmp/dev-storm-zookeeper",
+    "topology.tick.tuple.freq.secs": null,
+    "topology.builtin.metrics.bucket.size.secs": 60,
+    "topology.fall.back.on.java.serialization": true,
+    "topology.max.error.report.per.interval": 5,
+    "zmq.linger.millis": 5000,
+    "topology.skip.missing.kryo.registrations": false,
+    "storm.messaging.netty.client_worker_threads": 1,
+    "ui.childopts": "-Xmx768m",
+    "storm.zookeeper.session.timeout": 20000,
+    "nimbus.reassign": true,
+    "topology.trident.batch.emit.interval.millis": 500,
+    "storm.messaging.netty.flush.check.interval.ms": 10,
+    "nimbus.monitor.freq.secs": 10,
+    "logviewer.childopts": "-Xmx128m",
+    "java.library.path": "/usr/local/lib:/opt/local/lib:/usr/lib",
+    "topology.executor.send.buffer.size": 1024,
+    }
+```
+
+### /api/v1/cluster/summary (GET)
+
+Returns cluster summary information such as nimbus uptime or number of 
supervisors.
+
+Response fields:
+
+|Field  |Value|Description
+|---   |---    |---
+|stormVersion|String| Storm version|
+|nimbusUptime|String| Shows how long the cluster is running|
+|supervisors|Integer| Number of supervisors running|
+|topologies| Integer| Number of topologies running| 
+|slotsTotal| Integer|Total number of available worker slots|
+|slotsUsed| Integer| Number of worker slots used|
+|slotsFree| Integer |Number of worker slots available|
+|executorsTotal| Integer |Total number of executors|
+|tasksTotal| Integer |Total tasks|
+
+Sample response:
+
+```json
+   {
+    "stormVersion": "0.9.2-incubating-SNAPSHOT",
+    "nimbusUptime": "3m 53s",
+    "supervisors": 1,
+    "slotsTotal": 4,
+    "slotsUsed": 3,
+    "slotsFree": 1,
+    "executorsTotal": 28,
+    "tasksTotal": 28
+    }
+```
+
+### /api/v1/supervisor/summary (GET)
+
+Returns summary information for all supervisors.
+
+Response fields:
+
+|Field  |Value|Description|
+|---   |---    |---
+|id| String | Supervisor's id|
+|host| String| Supervisor's host name|
+|uptime| String| Shows how long the supervisor is running|
+|slotsTotal| Integer| Total number of available worker slots for this 
supervisor|
+|slotsUsed| Integer| Number of worker slots used on this supervisor|
+
+Sample response:
+
+```json
+{
+    "supervisors": [
+        {
+            "id": "0b879808-2a26-442b-8f7d-23101e0c3696",
+            "host": "10.11.1.7",
+            "uptime": "5m 58s",
+            "slotsTotal": 4,
+            "slotsUsed": 3
+        }
+    ]
+}
+```
+
+### /api/v1/topology/summary (GET)
+
+Returns summary information for all topologies.
+
+Response fields:
+
+|Field  |Value | Description|
+|---   |---    |---
+|id| String| Topology Id|
+|name| String| Topology Name|
+|status| String| Topology Status|
+|uptime| String|  Shows how long the topology is running|
+|tasksTotal| Integer |Total number of tasks for this topology|
+|workersTotal| Integer |Number of workers used for this topology|
+|executorsTotal| Integer |Number of executors used for this topology|
+
+Sample response:
+
+```json
+{
+    "topologies": [
+        {
+            "id": "WordCount3-1-1402960825",
+            "name": "WordCount3",
+            "status": "ACTIVE",
+            "uptime": "6m 5s",
+            "tasksTotal": 28,
+            "workersTotal": 3,
+            "executorsTotal": 28
+        }
+    ]
+}
+```
+
+### /api/v1/topology/:id (GET)
+
+Returns topology information and statistics.  Substitute id with topology id.
+
+Request parameters:
+
+|Parameter |Value   |Description  |
+|----------|--------|-------------|
+|id       |String (required)| Topology Id  |
+|window    |String. Default value :all-time| Window duration for metrics in 
seconds|
+|sys       |String. Values 1 or 0. Default value 0| Controls including sys 
stats part of the response|
+
+
+Response fields:
+
+|Field  |Value |Description|
+|---   |---    |---
+|id| String| Topology Id|
+|name| String |Topology Name|
+|uptime| String |How long the topology has been running|
+|status| String |Current status of the topology, e.g. "ACTIVE"|
+|tasksTotal| Integer |Total number of tasks for this topology|
+|workersTotal| Integer |Number of workers used for this topology|
+|executorsTotal| Integer |Number of executors used for this topology|
+|msgTimeout| Integer | Number of seconds a tuple has before the spout 
considers it failed |
+|windowHint| String | window param value in "hh mm ss" format. Default value 
is "All Time"|
+|topologyStats| Array | Array of all the topology related stats per time 
window|
+|topologyStats.windowPretty| String |Duration passed in HH:MM:SS format|
+|topologyStats.window| String |User requested time window for metrics|
+|topologyStats.emitted| Long |Number of messages emitted in given window|
+|topologyStats.trasferred| Long |Number messages transferred in given window|
+|topologyStats.completeLatency| String (double value returned in String 
format) |Total latency for processing the message|
+|topologyStats.acked| Long |Number of messages acked in given window|
+|topologyStats.failed| Long |Number of messages failed in given window|
+|spouts| Array | Array of all the spout components in the topology|
+|spouts.spoutId| String |Spout id|
+|spouts.executors| Integer |Number of executors for the spout|
+|spouts.emitted| Long |Number of messages emitted in given window |
+|spouts.completeLatency| String (double value returned in String format) 
|Total latency for processing the message|
+|spouts.transferred| Long |Total number of messages  transferred in given 
window|
+|spouts.tasks| Integer |Total number of tasks for the spout|
+|spouts.lastError| String |Shows the last error happened in a spout|
+|spouts.errorLapsedSecs| Integer | Number of seconds elapsed since that last 
error happened in a spout|
+|spouts.errorWorkerLogLink| String | Link to the worker log that reported the 
exception |
+|spouts.acked| Long |Number of messages acked|
+|spouts.failed| Long |Number of messages failed|
+|bolts| Array | Array of bolt components in the topology|
+|bolts.boltId| String |Bolt id|
+|bolts.capacity| String (double value returned in String format) |This value 
indicates number of messages executed * average execute latency / time window|
+|bolts.processLatency| String (double value returned in String format)  
|Average time of the bolt to ack a message after it was received|
+|bolts.executeLatency| String (double value returned in String format) 
|Average time to run the execute method of the bolt|
+|bolts.executors| Integer |Number of executor tasks in the bolt component|
+|bolts.tasks| Integer |Number of instances of bolt|
+|bolts.acked| Long |Number of tuples acked by the bolt|
+|bolts.failed| Long |Number of tuples failed by the bolt|
+|bolts.lastError| String |Shows the last error occurred in the bolt|
+|bolts.errorLapsedSecs| Integer |Number of seconds elapsed since that last 
error happened in a bolt|
+|bolts.errorWorkerLogLink| String | Link to the worker log that reported the 
exception |
+|bolts.emitted| Long |Number of tuples emitted|
+
+Examples:
+
+```no-highlight
+ 1. http://ui-daemon-host-name:8080/api/v1/topology/WordCount3-1-1402960825
+ 2. 
http://ui-daemon-host-name:8080/api/v1/topology/WordCount3-1-1402960825?sys=1
+ 3. 
http://ui-daemon-host-name:8080/api/v1/topology/WordCount3-1-1402960825?window=600
+```
+
+Sample response:
+
+```json
+ {
+    "name": "WordCount3",
+    "id": "WordCount3-1-1402960825",
+    "workersTotal": 3,
+    "window": "600",
+    "status": "ACTIVE",
+    "tasksTotal": 28,
+    "executorsTotal": 28,
+    "uptime": "29m 19s",
+    "msgTimeout": 30,
+    "windowHint": "10m 0s",
+    "topologyStats": [
+        {
+            "windowPretty": "10m 0s",
+            "window": "600",
+            "emitted": 397960,
+            "transferred": 213380,
+            "completeLatency": "0.000",
+            "acked": 213460,
+            "failed": 0
+        },
+        {
+            "windowPretty": "3h 0m 0s",
+            "window": "10800",
+            "emitted": 1190260,
+            "transferred": 638260,
+            "completeLatency": "0.000",
+            "acked": 638280,
+            "failed": 0
+        },
+        {
+            "windowPretty": "1d 0h 0m 0s",
+            "window": "86400",
+            "emitted": 1190260,
+            "transferred": 638260,
+            "completeLatency": "0.000",
+            "acked": 638280,
+            "failed": 0
+        },
+        {
+            "windowPretty": "All time",
+            "window": ":all-time",
+            "emitted": 1190260,
+            "transferred": 638260,
+            "completeLatency": "0.000",
+            "acked": 638280,
+            "failed": 0
+        }
+    ],
+    "spouts": [
+        {
+            "executors": 5,
+            "emitted": 28880,
+            "completeLatency": "0.000",
+            "transferred": 28880,
+            "acked": 0,
+            "spoutId": "spout",
+            "tasks": 5,
+            "lastError": "",
+            "errorLapsedSecs": null,
+            "failed": 0
+        }
+    ],
+        "bolts": [
+        {
+            "executors": 12,
+            "emitted": 184580,
+            "transferred": 0,
+            "acked": 184640,
+            "executeLatency": "0.048",
+            "tasks": 12,
+            "executed": 184620,
+            "processLatency": "0.043",
+            "boltId": "count",
+            "lastError": "",
+            "errorLapsedSecs": null,
+            "capacity": "0.003",
+            "failed": 0
+        },
+        {
+            "executors": 8,
+            "emitted": 184500,
+            "transferred": 184500,
+            "acked": 28820,
+            "executeLatency": "0.024",
+            "tasks": 8,
+            "executed": 28780,
+            "processLatency": "2.112",
+            "boltId": "split",
+            "lastError": "",
+            "errorLapsedSecs": null,
+            "capacity": "0.000",
+            "failed": 0
+        }
+    ],
+    "configuration": {
+        "storm.id": "WordCount3-1-1402960825",
+        "dev.zookeeper.path": "/tmp/dev-storm-zookeeper",
+        "topology.tick.tuple.freq.secs": null,
+        "topology.builtin.metrics.bucket.size.secs": 60,
+        "topology.fall.back.on.java.serialization": true,
+        "topology.max.error.report.per.interval": 5,
+        "zmq.linger.millis": 5000,
+        "topology.skip.missing.kryo.registrations": false,
+        "storm.messaging.netty.client_worker_threads": 1,
+        "ui.childopts": "-Xmx768m",
+        "storm.zookeeper.session.timeout": 20000,
+        "nimbus.reassign": true,
+        "topology.trident.batch.emit.interval.millis": 500,
+        "storm.messaging.netty.flush.check.interval.ms": 10,
+        "nimbus.monitor.freq.secs": 10,
+        "logviewer.childopts": "-Xmx128m",
+        "java.library.path": "/usr/local/lib:/opt/local/lib:/usr/lib",
+        "topology.executor.send.buffer.size": 1024,
+        "storm.local.dir": "storm-local",
+        "storm.messaging.netty.buffer_size": 5242880,
+        "supervisor.worker.start.timeout.secs": 120,
+        "topology.enable.message.timeouts": true,
+        "nimbus.cleanup.inbox.freq.secs": 600,
+        "nimbus.inbox.jar.expiration.secs": 3600,
+        "drpc.worker.threads": 64,
+        "topology.worker.shared.thread.pool.size": 4,
+        "nimbus.host": "hw10843.local",
+        "storm.messaging.netty.min_wait_ms": 100,
+        "storm.zookeeper.port": 2181,
+        "transactional.zookeeper.port": null,
+        "topology.executor.receive.buffer.size": 1024,
+        "transactional.zookeeper.servers": null,
+        "storm.zookeeper.root": "/storm",
+        "storm.zookeeper.retry.intervalceiling.millis": 30000,
+        "supervisor.enable": true,
+        "storm.messaging.netty.server_worker_threads": 1
+    }
+}
+```
+
+
+### /api/v1/topology/:id/component/:component (GET)
+
+Returns detailed metrics and executor information
+
+|Parameter |Value   |Description  |
+|----------|--------|-------------|
+|id       |String (required)| Topology Id  |
+|component |String (required)| Component Id |
+|window    |String. Default value :all-time| window duration for metrics in 
seconds|
+|sys       |String. Values 1 or 0. Default value 0| controls including sys 
stats part of the response|
+
+Response fields:
+
+|Field  |Value |Description|
+|---   |---    |---
+|id   | String | Component id|
+|name | String | Topology name|
+|componentType | String | component type: SPOUT or BOLT|
+|windowHint| String | window param value in "hh mm ss" format. Default value 
is "All Time"|
+|executors| Integer |Number of executor tasks in the component|
+|componentErrors| Array of Errors | List of component errors|
+|componentErrors.time| Long | Timestamp when the exception occurred |
+|componentErrors.errorHost| String | host name for the error|
+|componentErrors.errorPort| String | port for the error|
+|componentErrors.error| String |Shows the error happened in a component|
+|componentErrors.errorLapsedSecs| Integer | Number of seconds elapsed since 
the error happened in a component |
+|componentErrors.errorWorkerLogLink| String | Link to the worker log that 
reported the exception |
+|topologyId| String | Topology id|
+|tasks| Integer |Number of instances of component|
+|window    |String. Default value "All Time" | window duration for metrics in 
seconds|
+|spoutSummary or boltStats| Array |Array of component stats. **Please note 
this element tag can be spoutSummary or boltStats depending on the 
componentType**|
+|spoutSummary.windowPretty| String |Duration passed in HH:MM:SS format|
+|spoutSummary.window| String | window duration for metrics in seconds|
+|spoutSummary.emitted| Long |Number of messages emitted in given window |
+|spoutSummary.completeLatency| String (double value returned in String format) 
|Total latency for processing the message|
+|spoutSummary.transferred| Long |Total number of messages  transferred in 
given window|
+|spoutSummary.acked| Long |Number of messages acked|
+|spoutSummary.failed| Long |Number of messages failed|
+|boltStats.windowPretty| String |Duration passed in HH:MM:SS format|
+|boltStats..window| String | window duration for metrics in seconds|
+|boltStats.transferred| Long |Total number of messages  transferred in given 
window|
+|boltStats.processLatency| String (double value returned in String format)  
|Average time of the bolt to ack a message after it was received|
+|boltStats.acked| Long |Number of messages acked|
+|boltStats.failed| Long |Number of messages failed|
+
+Examples:
+
+```no-highlight
+1. 
http://ui-daemon-host-name:8080/api/v1/topology/WordCount3-1-1402960825/component/spout
+2. 
http://ui-daemon-host-name:8080/api/v1/topology/WordCount3-1-1402960825/component/spout?sys=1
+3. 
http://ui-daemon-host-name:8080/api/v1/topology/WordCount3-1-1402960825/component/spout?window=600
+```
+
+Sample response:
+
+```json
+{
+    "name": "WordCount3",
+    "id": "spout",
+    "componentType": "spout",
+    "windowHint": "10m 0s",
+    "executors": 5,
+    "componentErrors":[{"time": 1406006074000,
+                        "errorHost": "10.11.1.70",
+                        "errorPort": 6701,
+                        "errorWorkerLogLink": 
"http://10.11.1.7:8000/log?file=worker-6701.log";,
+                        "errorLapsedSecs": 16,
+                        "error": "java.lang.RuntimeException: 
java.lang.StringIndexOutOfBoundsException: Some Error\n\tat 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)\n\tat
 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)\n\tat
 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)\n\tat
 backtype...more.."
+    }],
+    "topologyId": "WordCount3-1-1402960825",
+    "tasks": 5,
+    "window": "600",
+    "spoutSummary": [
+        {
+            "windowPretty": "10m 0s",
+            "window": "600",
+            "emitted": 28500,
+            "transferred": 28460,
+            "completeLatency": "0.000",
+            "acked": 0,
+            "failed": 0
+        },
+        {
+            "windowPretty": "3h 0m 0s",
+            "window": "10800",
+            "emitted": 127640,
+            "transferred": 127440,
+            "completeLatency": "0.000",
+            "acked": 0,
+            "failed": 0
+        },
+        {
+            "windowPretty": "1d 0h 0m 0s",
+            "window": "86400",
+            "emitted": 127640,
+            "transferred": 127440,
+            "completeLatency": "0.000",
+            "acked": 0,
+            "failed": 0
+        },
+        {
+            "windowPretty": "All time",
+            "window": ":all-time",
+            "emitted": 127640,
+            "transferred": 127440,
+            "completeLatency": "0.000",
+            "acked": 0,
+            "failed": 0
+        }
+    ],
+    "outputStats": [
+        {
+            "stream": "__metrics",
+            "emitted": 40,
+            "transferred": 0,
+            "completeLatency": "0",
+            "acked": 0,
+            "failed": 0
+        },
+        {
+            "stream": "default",
+            "emitted": 28460,
+            "transferred": 28460,
+            "completeLatency": "0",
+            "acked": 0,
+            "failed": 0
+        }
+    ],
+    "executorStats": [
+        {
+            "workerLogLink": "http://10.11.1.7:8000/log?file=worker-6701.log";,
+            "emitted": 5720,
+            "port": 6701,
+            "completeLatency": "0.000",
+            "transferred": 5720,
+            "host": "10.11.1.7",
+            "acked": 0,
+            "uptime": "43m 4s",
+            "id": "[24-24]",
+            "failed": 0
+        },
+        {
+            "workerLogLink": "http://10.11.1.7:8000/log?file=worker-6703.log";,
+            "emitted": 5700,
+            "port": 6703,
+            "completeLatency": "0.000",
+            "transferred": 5700,
+            "host": "10.11.1.7",
+            "acked": 0,
+            "uptime": "42m 57s",
+            "id": "[25-25]",
+            "failed": 0
+        },
+        {
+            "workerLogLink": "http://10.11.1.7:8000/log?file=worker-6702.log";,
+            "emitted": 5700,
+            "port": 6702,
+            "completeLatency": "0.000",
+            "transferred": 5680,
+            "host": "10.11.1.7",
+            "acked": 0,
+            "uptime": "42m 57s",
+            "id": "[26-26]",
+            "failed": 0
+        },
+        {
+            "workerLogLink": "http://10.11.1.7:8000/log?file=worker-6701.log";,
+            "emitted": 5700,
+            "port": 6701,
+            "completeLatency": "0.000",
+            "transferred": 5680,
+            "host": "10.11.1.7",
+            "acked": 0,
+            "uptime": "43m 4s",
+            "id": "[27-27]",
+            "failed": 0
+        },
+        {
+            "workerLogLink": "http://10.11.1.7:8000/log?file=worker-6703.log";,
+            "emitted": 5680,
+            "port": 6703,
+            "completeLatency": "0.000",
+            "transferred": 5680,
+            "host": "10.11.1.7",
+            "acked": 0,
+            "uptime": "42m 57s",
+            "id": "[28-28]",
+            "failed": 0
+        }
+    ]
+}
+```
+
+## POST Operations
+
+### /api/v1/topology/:id/activate (POST)
+
+Activates a topology.
+
+|Parameter |Value   |Description  |
+|----------|--------|-------------|
+|id       |String (required)| Topology Id  |
+
+Sample Response:
+
+```json
+{"topologyOperation":"activate","topologyId":"wordcount-1-1420308665","status":"success"}
+```
+
+
+### /api/v1/topology/:id/deactivate (POST)
+
+Deactivates a topology.
+
+|Parameter |Value   |Description  |
+|----------|--------|-------------|
+|id       |String (required)| Topology Id  |
+
+Sample Response:
+
+```json
+{"topologyOperation":"deactivate","topologyId":"wordcount-1-1420308665","status":"success"}
+```
+
+
+### /api/v1/topology/:id/rebalance/:wait-time (POST)
+
+Rebalances a topology.
+
+|Parameter |Value   |Description  |
+|----------|--------|-------------|
+|id       |String (required)| Topology Id  |
+|wait-time |String (required)| Wait time before rebalance happens |
+|rebalanceOptions| Json (optional) | topology rebalance options |
+
+
+Sample rebalanceOptions json:
+
+```json
+{"rebalanceOptions" : {"numWorkers" : 2, "executors" : {"spout" :4, "count" : 
10}}, "callback" : "foo"}
+```
+
+Examples:
+
+```no-highlight
+curl  -i -b ~/cookiejar.txt -c ~/cookiejar.txt -X POST  
+-H "Content-Type: application/json" 
+-d  '{"rebalanceOptions": {"numWorkers": 2, "executors": { "spout" : "5", 
"split": 7, "count": 5 }}, "callback":"foo"}' 
+http://localhost:8080/api/v1/topology/wordcount-1-1420308665/rebalance/0
+```
+
+Sample Response:
+
+```json
+{"topologyOperation":"rebalance","topologyId":"wordcount-1-1420308665","status":"success"}
+```
+
+
+
+### /api/v1/topology/:id/kill/:wait-time (POST)
+
+Kills a topology.
+
+|Parameter |Value   |Description  |
+|----------|--------|-------------|
+|id       |String (required)| Topology Id  |
+|wait-time |String (required)| Wait time before rebalance happens |
+
+Caution: Small wait times (0-5 seconds) may increase the probability of 
triggering the bug reported in
+[STORM-112](https://issues.apache.org/jira/browse/STORM-112), which may result 
in broker Supervisor
+daemons.
+
+Sample Response:
+
+```json
+{"topologyOperation":"kill","topologyId":"wordcount-1-1420308665","status":"success"}
+```
+
+## API errors
+
+The API returns 500 HTTP status codes in case of any errors.
+
+Sample response:
+
+```json
+{
+  "error": "Internal Server Error",
+  "errorMessage": "java.lang.NullPointerException\n\tat 
clojure.core$name.invoke(core.clj:1505)\n\tat 
backtype.storm.ui.core$component_page.invoke(core.clj:752)\n\tat 
backtype.storm.ui.core$fn__7766.invoke(core.clj:782)\n\tat 
compojure.core$make_route$fn__5755.invoke(core.clj:93)\n\tat 
compojure.core$if_route$fn__5743.invoke(core.clj:39)\n\tat 
compojure.core$if_method$fn__5736.invoke(core.clj:24)\n\tat 
compojure.core$routing$fn__5761.invoke(core.clj:106)\n\tat 
clojure.core$some.invoke(core.clj:2443)\n\tat 
compojure.core$routing.doInvoke(core.clj:106)\n\tat 
clojure.lang.RestFn.applyTo(RestFn.java:139)\n\tat 
clojure.core$apply.invoke(core.clj:619)\n\tat 
compojure.core$routes$fn__5765.invoke(core.clj:111)\n\tat 
ring.middleware.reload$wrap_reload$fn__6880.invoke(reload.clj:14)\n\tat 
backtype.storm.ui.core$catch_errors$fn__7800.invoke(core.clj:836)\n\tat 
ring.middleware.keyword_params$wrap_keyword_params$fn__6319.invoke(keyword_params.clj:27)\n\tat
 ring.middleware.nested_params$wrap_nest
 ed_params$fn__6358.invoke(nested_params.clj:65)\n\tat 
ring.middleware.params$wrap_params$fn__6291.invoke(params.clj:55)\n\tat 
ring.middleware.multipart_params$wrap_multipart_params$fn__6386.invoke(multipart_params.clj:103)\n\tat
 ring.middleware.flash$wrap_flash$fn__6675.invoke(flash.clj:14)\n\tat 
ring.middleware.session$wrap_session$fn__6664.invoke(session.clj:43)\n\tat 
ring.middleware.cookies$wrap_cookies$fn__6595.invoke(cookies.clj:160)\n\tat 
ring.adapter.jetty$proxy_handler$fn__6112.invoke(jetty.clj:16)\n\tat 
ring.adapter.jetty.proxy$org.mortbay.jetty.handler.AbstractHandler$0.handle(Unknown
 Source)\n\tat 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)\n\tat 
org.mortbay.jetty.Server.handle(Server.java:326)\n\tat 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)\n\tat 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)\n\tat
 org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)\n\tat org.mortb
 ay.jetty.HttpParser.parseAvailable(HttpParser.java:212)\n\tat 
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)\n\tat 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)\n\tat
 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)\n"
+}
+```


http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Serialization-(prior-to-0.6.0).md
----------------------------------------------------------------------
diff --git a/docs/Serialization-(prior-to-0.6.0).md 
b/docs/Serialization-(prior-to-0.6.0).md
new file mode 100644
index 0000000..e4a0d4f
--- /dev/null
+++ b/docs/Serialization-(prior-to-0.6.0).md
@@ -0,0 +1,50 @@
+---
+layout: documentation
+---
+Tuples can be comprised of objects of any types. Since Storm is a distributed 
system, it needs to know how to serialize and deserialize objects when they're 
passed between tasks. By default Storm can serialize ints, shorts, longs, 
floats, doubles, bools, bytes, strings, and byte arrays, but if you want to use 
another type in your tuples, you'll need to implement a custom serializer.
+
+### Dynamic typing
+
+There are no type declarations for fields in a Tuple. You put objects in 
fields and Storm figures out the serialization dynamically. Before we get to 
the interface for serialization, let's spend a moment understanding why Storm's 
tuples are dynamically typed.
+
+Adding static typing to tuple fields would add large amount of complexity to 
Storm's API. Hadoop, for example, statically types its keys and values but 
requires a huge amount of annotations on the part of the user. Hadoop's API is 
a burden to use and the "type safety" isn't worth it. Dynamic typing is simply 
easier to use.
+
+Further than that, it's not possible to statically type Storm's tuples in any 
reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from 
all those streams may have different types across the fields. When a Bolt 
receives a `Tuple` in `execute`, that tuple could have come from any stream and 
so could have any combination of types. There might be some reflection magic 
you can do to declare a different method for every tuple stream a bolt 
subscribes to, but Storm opts for the simpler, straightforward approach of 
dynamic typing.
+
+Finally, another reason for using dynamic typing is so Storm can be used in a 
straightforward manner from dynamically typed languages like Clojure and JRuby.
+
+### Custom serialization
+
+Let's dive into Storm's API for defining custom serializations. There are two 
steps you need to take as a user to create a custom serialization: implement 
the serializer, and register the serializer to Storm.
+
+#### Creating a serializer
+
+Custom serializers implement the 
[ISerialization](javadocs/backtype/storm/serialization/ISerialization.html) 
interface. Implementations specify how to serialize and deserialize types into 
a binary format.
+
+The interface looks like this:
+
+```java
+public interface ISerialization<T> {
+    public boolean accept(Class c);
+    public void serialize(T object, DataOutputStream stream) throws 
IOException;
+    public T deserialize(DataInputStream stream) throws IOException;
+}
+```
+
+Storm uses the `accept` method to determine if a type can be serialized by 
this serializer. Remember, Storm's tuples are dynamically typed so Storm 
determines what serializer to use at runtime.
+
+`serialize` writes the object out to the output stream in binary format. The 
field must be written in a way such that it can be deserialized later. For 
example, if you're writing out a list of objects, you'll need to write out the 
size of the list first so that you know how many elements to deserialize.
+
+`deserialize` reads the serialized object off of the stream and returns it.
+
+You can see example serialization implementations in the source for 
[SerializationFactory](https://github.com/apache/incubator-storm/blob/0.5.4/src/jvm/backtype/storm/serialization/SerializationFactory.java)
+
+#### Registering a serializer
+
+Once you create a serializer, you need to tell Storm it exists. This is done 
through the Storm configuration (See [Concepts](Concepts.html) for information 
about how configuration works in Storm). You can register serializations either 
through the config given when submitting a topology or in the storm.yaml files 
across your cluster.
+
+Serializer registrations are done through the Config.TOPOLOGY_SERIALIZATIONS 
config and is simply a list of serialization class names.
+
+Storm provides helpers for registering serializers in a topology config. The 
[Config](javadocs/backtype/storm/Config.html) class has a method called 
`addSerialization` that takes in a serializer class to add to the config.
+
+There's an advanced config called Config.TOPOLOGY_SKIP_MISSING_SERIALIZATIONS. 
If you set this to true, Storm will ignore any serializations that are 
registered but do not have their code available on the classpath. Otherwise, 
Storm will throw errors when it can't find a serialization. This is useful if 
you run many topologies on a cluster that each have different serializations, 
but you want to declare all the serializations across all topologies in the 
`storm.yaml` files.

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Serialization.md
----------------------------------------------------------------------
diff --git a/docs/Serialization.md b/docs/Serialization.md
new file mode 100644
index 0000000..4c271b4
--- /dev/null
+++ b/docs/Serialization.md
@@ -0,0 +1,60 @@
+---
+layout: documentation
+---
+This page is about how the serialization system in Storm works for versions 
0.6.0 and onwards. Storm used a different serialization system prior to 0.6.0 
which is documented on [Serialization (prior to 
0.6.0)](Serialization-\(prior-to-0.6.0\).html). 
+
+Tuples can be comprised of objects of any types. Since Storm is a distributed 
system, it needs to know how to serialize and deserialize objects when they're 
passed between tasks.
+
+Storm uses [Kryo](http://code.google.com/p/kryo/) for serialization. Kryo is a 
flexible and fast serialization library that produces small serializations.
+
+By default, Storm can serialize primitive types, strings, byte arrays, 
ArrayList, HashMap, HashSet, and the Clojure collection types. If you want to 
use another type in your tuples, you'll need to register a custom serializer.
+
+### Dynamic typing
+
+There are no type declarations for fields in a Tuple. You put objects in 
fields and Storm figures out the serialization dynamically. Before we get to 
the interface for serialization, let's spend a moment understanding why Storm's 
tuples are dynamically typed.
+
+Adding static typing to tuple fields would add large amount of complexity to 
Storm's API. Hadoop, for example, statically types its keys and values but 
requires a huge amount of annotations on the part of the user. Hadoop's API is 
a burden to use and the "type safety" isn't worth it. Dynamic typing is simply 
easier to use.
+
+Further than that, it's not possible to statically type Storm's tuples in any 
reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from 
all those streams may have different types across the fields. When a Bolt 
receives a `Tuple` in `execute`, that tuple could have come from any stream and 
so could have any combination of types. There might be some reflection magic 
you can do to declare a different method for every tuple stream a bolt 
subscribes to, but Storm opts for the simpler, straightforward approach of 
dynamic typing.
+
+Finally, another reason for using dynamic typing is so Storm can be used in a 
straightforward manner from dynamically typed languages like Clojure and JRuby.
+
+### Custom serialization
+
+As mentioned, Storm uses Kryo for serialization. To implement custom 
serializers, you need to register new serializers with Kryo. It's highly 
recommended that you read over [Kryo's home 
page](http://code.google.com/p/kryo/) to understand how it handles custom 
serialization.
+
+Adding custom serializers is done through the "topology.kryo.register" 
property in your topology config. It takes a list of registrations, where each 
registration can take one of two forms:
+
+1. The name of a class to register. In this case, Storm will use Kryo's 
`FieldsSerializer` to serialize the class. This may or may not be optimal for 
the class -- see the Kryo docs for more details.
+2. A map from the name of a class to register to an implementation of 
[com.esotericsoftware.kryo.Serializer](http://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/Serializer.java).
+
+Let's look at an example.
+
+```
+topology.kryo.register:
+  - com.mycompany.CustomType1
+  - com.mycompany.CustomType2: com.mycompany.serializer.CustomType2Serializer
+  - com.mycompany.CustomType3
+```
+
+`com.mycompany.CustomType1` and `com.mycompany.CustomType3` will use the 
`FieldsSerializer`, whereas `com.mycompany.CustomType2` will use 
`com.mycompany.serializer.CustomType2Serializer` for serialization.
+
+Storm provides helpers for registering serializers in a topology config. The 
[Config](javadocs/backtype/storm/Config.html) class has a method called 
`registerSerialization` that takes in a registration to add to the config.
+
+There's an advanced config called 
`Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS`. If you set this to true, 
Storm will ignore any serializations that are registered but do not have their 
code available on the classpath. Otherwise, Storm will throw errors when it 
can't find a serialization. This is useful if you run many topologies on a 
cluster that each have different serializations, but you want to declare all 
the serializations across all topologies in the `storm.yaml` files.
+
+### Java serialization
+
+If Storm encounters a type for which it doesn't have a serialization 
registered, it will use Java serialization if possible. If the object can't be 
serialized with Java serialization, then Storm will throw an error.
+
+Beware that Java serialization is extremely expensive, both in terms of CPU 
cost as well as the size of the serialized object. It is highly recommended 
that you register custom serializers when you put the topology in production. 
The Java serialization behavior is there so that it's easy to prototype new 
topologies.
+
+You can turn off the behavior to fall back on Java serialization by setting 
the `Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION` config to false.
+
+### Component-specific serialization registrations
+
+Storm 0.7.0 lets you set component-specific configurations (read more about 
this at [Configuration](Configuration.html)). Of course, if one component 
defines a serialization that serialization will need to be available to other 
bolts -- otherwise they won't be able to receive messages from that component!
+
+When a topology is submitted, a single set of serializations is chosen to be 
used by all components in the topology for sending messages. This is done by 
merging the component-specific serializer registrations with the regular set of 
serialization registrations. If two components define serializers for the same 
class, one of the serializers is chosen arbitrarily.
+
+To force a serializer for a particular class if there's a conflict between two 
component-specific registrations, just define the serializer you want to use in 
the topology-specific configuration. The topology-specific configuration has 
precedence over component-specific configurations for serialization 
registrations.

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Serializers.md
----------------------------------------------------------------------
diff --git a/docs/Serializers.md b/docs/Serializers.md
new file mode 100644
index 0000000..071c885
--- /dev/null
+++ b/docs/Serializers.md
@@ -0,0 +1,4 @@
+---
+layout: documentation
+---
+* [storm-json](https://github.com/rapportive-oss/storm-json): Simple JSON 
serializer for Storm

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Setting-up-a-Storm-cluster.md
----------------------------------------------------------------------
diff --git a/docs/Setting-up-a-Storm-cluster.md 
b/docs/Setting-up-a-Storm-cluster.md
new file mode 100644
index 0000000..e139523
--- /dev/null
+++ b/docs/Setting-up-a-Storm-cluster.md
@@ -0,0 +1,83 @@
+---
+layout: documentation
+---
+This page outlines the steps for getting a Storm cluster up and running. If 
you're on AWS, you should check out the 
[storm-deploy](https://github.com/nathanmarz/storm-deploy/wiki) project. 
[storm-deploy](https://github.com/nathanmarz/storm-deploy/wiki) completely 
automates the provisioning, configuration, and installation of Storm clusters 
on EC2. It also sets up Ganglia for you so you can monitor CPU, disk, and 
network usage.
+
+If you run into difficulties with your Storm cluster, first check for a 
solution is in the [Troubleshooting](Troubleshooting.html) page. Otherwise, 
email the mailing list.
+
+Here's a summary of the steps for setting up a Storm cluster:
+
+1. Set up a Zookeeper cluster
+2. Install dependencies on Nimbus and worker machines
+3. Download and extract a Storm release to Nimbus and worker machines
+4. Fill in mandatory configurations into storm.yaml
+5. Launch daemons under supervision using "storm" script and a supervisor of 
your choice
+
+### Set up a Zookeeper cluster
+
+Storm uses Zookeeper for coordinating the cluster. Zookeeper **is not** used 
for message passing, so the load Storm places on Zookeeper is quite low. Single 
node Zookeeper clusters should be sufficient for most cases, but if you want 
failover or are deploying large Storm clusters you may want larger Zookeeper 
clusters. Instructions for deploying Zookeeper are 
[here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html). 
+
+A few notes about Zookeeper deployment:
+
+1. It's critical that you run Zookeeper under supervision, since Zookeeper is 
fail-fast and will exit the process if it encounters any error case. See 
[here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_supervision)
 for more details. 
+2. It's critical that you set up a cron to compact Zookeeper's data and 
transaction logs. The Zookeeper daemon does not do this on its own, and if you 
don't set up a cron, Zookeeper will quickly run out of disk space. See 
[here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_maintenance)
 for more details.
+
+### Install dependencies on Nimbus and worker machines
+
+Next you need to install Storm's dependencies on Nimbus and the worker 
machines. These are:
+
+1. Java 6
+2. Python 2.6.6
+
+These are the versions of the dependencies that have been tested with Storm. 
Storm may or may not work with different versions of Java and/or Python.
+
+
+### Download and extract a Storm release to Nimbus and worker machines
+
+Next, download a Storm release and extract the zip file somewhere on Nimbus 
and each of the worker machines. The Storm releases can be downloaded [from 
here](http://github.com/apache/incubator-storm/downloads).
+
+### Fill in mandatory configurations into storm.yaml
+
+The Storm release contains a file at `conf/storm.yaml` that configures the 
Storm daemons. You can see the default configuration values 
[here](https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml).
 storm.yaml overrides anything in defaults.yaml. There's a few configurations 
that are mandatory to get a working cluster:
+
+1) **storm.zookeeper.servers**: This is a list of the hosts in the Zookeeper 
cluster for your Storm cluster. It should look something like:
+
+```yaml
+storm.zookeeper.servers:
+  - "111.222.333.444"
+  - "555.666.777.888"
+```
+
+If the port that your Zookeeper cluster uses is different than the default, 
you should set **storm.zookeeper.port** as well.
+
+2) **storm.local.dir**: The Nimbus and Supervisor daemons require a directory 
on the local disk to store small amounts of state (like jars, confs, and things 
like that). You should create that directory on each machine, give it proper 
permissions, and then fill in the directory location using this config. For 
example:
+
+```yaml
+storm.local.dir: "/mnt/storm"
+```
+
+3) **nimbus.host**: The worker nodes need to know which machine is the master 
in order to download topology jars and confs. For example:
+
+```yaml
+nimbus.host: "111.222.333.44"
+```
+
+4) **supervisor.slots.ports**: For each worker machine, you configure how many 
workers run on that machine with this config. Each worker uses a single port 
for receiving messages, and this setting defines which ports are open for use. 
If you define five ports here, then Storm will allocate up to five workers to 
run on this machine. If you define three ports, Storm will only run up to 
three. By default, this setting is configured to run 4 workers on the ports 
6700, 6701, 6702, and 6703. For example:
+
+```yaml
+supervisor.slots.ports:
+    - 6700
+    - 6701
+    - 6702
+    - 6703
+```
+
+### Launch daemons under supervision using "storm" script and a supervisor of 
your choice
+
+The last step is to launch all the Storm daemons. It is critical that you run 
each of these daemons under supervision. Storm is a __fail-fast__ system which 
means the processes will halt whenever an unexpected error is encountered. 
Storm is designed so that it can safely halt at any point and recover correctly 
when the process is restarted. This is why Storm keeps no state in-process -- 
if Nimbus or the Supervisors restart, the running topologies are unaffected. 
Here's how to run the Storm daemons:
+
+1. **Nimbus**: Run the command "bin/storm nimbus" under supervision on the 
master machine.
+2. **Supervisor**: Run the command "bin/storm supervisor" under supervision on 
each worker machine. The supervisor daemon is responsible for starting and 
stopping worker processes on that machine.
+3. **UI**: Run the Storm UI (a site you can access from the browser that gives 
diagnostics on the cluster and topologies) by running the command "bin/storm 
ui" under supervision. The UI can be accessed by navigating your web browser to 
http://{nimbus host}:8080. 
+
+As you can see, running the daemons is very straightforward. The daemons will 
log to the logs/ directory in wherever you extracted the Storm release.

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Setting-up-a-Storm-project-in-Eclipse.md
----------------------------------------------------------------------
diff --git a/docs/Setting-up-a-Storm-project-in-Eclipse.md 
b/docs/Setting-up-a-Storm-project-in-Eclipse.md
new file mode 100644
index 0000000..5137cd9
--- /dev/null
+++ b/docs/Setting-up-a-Storm-project-in-Eclipse.md
@@ -0,0 +1 @@
+- fill me in
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Setting-up-development-environment.md
----------------------------------------------------------------------
diff --git a/docs/Setting-up-development-environment.md 
b/docs/Setting-up-development-environment.md
new file mode 100644
index 0000000..07ba670
--- /dev/null
+++ b/docs/Setting-up-development-environment.md
@@ -0,0 +1,39 @@
+---
+layout: documentation
+---
+This page outlines what you need to do to get a Storm development environment 
set up. In summary, the steps are:
+
+1. Download a [Storm release](/releases.html) , unpack it, and put the 
unpacked `bin/` directory on your PATH
+2. To be able to start and stop topologies on a remote cluster, put the 
cluster information in `~/.storm/storm.yaml`
+
+More detail on each of these steps is below.
+
+### What is a development environment?
+
+Storm has two modes of operation: local mode and remote mode. In local mode, 
you can develop and test topologies completely in process on your local 
machine. In remote mode, you submit topologies for execution on a cluster of 
machines.
+
+A Storm development environment has everything installed so that you can 
develop and test Storm topologies in local mode, package topologies for 
execution on a remote cluster, and submit/kill topologies on a remote cluster.
+
+Let's quickly go over the relationship between your machine and a remote 
cluster. A Storm cluster is managed by a master node called "Nimbus". Your 
machine communicates with Nimbus to submit code (packaged as a jar) and 
topologies for execution on the cluster, and Nimbus will take care of 
distributing that code around the cluster and assigning workers to run your 
topology. Your machine uses a command line client called `storm` to communicate 
with Nimbus. The `storm` client is only used for remote mode; it is not used 
for developing and testing topologies in local mode.
+
+### Installing a Storm release locally
+
+If you want to be able to submit topologies to a remote cluster from your 
machine, you should install a Storm release locally. Installing a Storm release 
will give you the `storm` client that you can use to interact with remote 
clusters. To install Storm locally, download a release [from 
here](/releases.html) and unzip it somewhere on your computer. Then add the 
unpacked `bin/` directory onto your `PATH` and make sure the `bin/storm` script 
is executable.
+
+Installing a Storm release locally is only for interacting with remote 
clusters. For developing and testing topologies in local mode, it is 
recommended that you use Maven to include Storm as a dev dependency for your 
project. You can read more about using Maven for this purpose on 
[Maven](Maven.html). 
+
+### Starting and stopping topologies on a remote cluster
+
+The previous step installed the `storm` client on your machine which is used 
to communicate with remote Storm clusters. Now all you have to do is tell the 
client which Storm cluster to talk to. To do this, all you have to do is put 
the host address of the master in the `~/.storm/storm.yaml` file. It should 
look something like this:
+
+```
+nimbus.host: "123.45.678.890"
+```
+
+Alternatively, if you use the 
[storm-deploy](https://github.com/nathanmarz/storm-deploy) project to provision 
Storm clusters on AWS, it will automatically set up your ~/.storm/storm.yaml 
file. You can manually attach to a Storm cluster (or switch between multiple 
clusters) using the "attach" command, like so:
+
+```
+lein run :deploy --attach --name mystormcluster
+```
+
+More information is on the storm-deploy 
[wiki](https://github.com/nathanmarz/storm-deploy/wiki)

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Spout-implementations.md
----------------------------------------------------------------------
diff --git a/docs/Spout-implementations.md b/docs/Spout-implementations.md
new file mode 100644
index 0000000..10ddd42
--- /dev/null
+++ b/docs/Spout-implementations.md
@@ -0,0 +1,8 @@
+---
+layout: documentation
+---
+* [storm-kestrel](https://github.com/nathanmarz/storm-kestrel): Adapter to use 
Kestrel as a spout
+* [storm-amqp-spout](https://github.com/rapportive-oss/storm-amqp-spout): 
Adapter to use AMQP source as a spout
+* [storm-jms](https://github.com/ptgoetz/storm-jms): Adapter to use a JMS 
source as a spout
+* [storm-redis-pubsub](https://github.com/sorenmacbeth/storm-redis-pubsub): A 
spout that subscribes to a Redis pubsub stream
+* 
[storm-beanstalkd-spout](https://github.com/haitaoyao/storm-beanstalkd-spout): 
A spout that subscribes to a beanstalkd queue

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Storm-multi-language-protocol-(versions-0.7.0-and-below).md
----------------------------------------------------------------------
diff --git a/docs/Storm-multi-language-protocol-(versions-0.7.0-and-below).md 
b/docs/Storm-multi-language-protocol-(versions-0.7.0-and-below).md
new file mode 100644
index 0000000..1d4422f
--- /dev/null
+++ b/docs/Storm-multi-language-protocol-(versions-0.7.0-and-below).md
@@ -0,0 +1,122 @@
+---
+layout: documentation
+---
+This page explains the multilang protocol for versions 0.7.0 and below. The 
protocol changed in version 0.7.1.
+
+# Storm Multi-Language Protocol
+
+## The ShellBolt
+
+Support for multiple languages is implemented via the ShellBolt class.  This
+class implements the IBolt interfaces and implements the protocol for
+executing a script or program via the shell using Java's ProcessBuilder class.
+
+## Output fields
+
+Output fields are part of the Thrift definition of the topology. This means 
that when you multilang in Java, you need to create a bolt that extends 
ShellBolt, implements IRichBolt, and declared the fields in 
`declareOutputFields`. 
+You can learn more about this on [Concepts](Concepts.html)
+
+## Protocol Preamble
+
+A simple protocol is implemented via the STDIN and STDOUT of the executed
+script or program. A mix of simple strings and JSON encoded data are exchanged
+with the process making support possible for pretty much any language.
+
+# Packaging Your Stuff
+
+To run a ShellBolt on a cluster, the scripts that are shelled out to must be
+in the `resources/` directory within the jar submitted to the master.
+
+However, During development or testing on a local machine, the resources
+directory just needs to be on the classpath.
+
+## The Protocol
+
+Notes:
+* Both ends of this protocol use a line-reading mechanism, so be sure to
+trim off newlines from the input and to append them to your output.
+* All JSON inputs and outputs are terminated by a single line contained "end".
+* The bullet points below are written from the perspective of the script 
writer's
+STDIN and STDOUT.
+
+
+* Your script will be executed by the Bolt.
+* STDIN: A string representing a path. This is a PID directory.
+Your script should create an empty file named with it's pid in this directory. 
e.g.
+the PID is 1234, so an empty file named 1234 is created in the directory. This
+file lets the supervisor know the PID so it can shutdown the process later on.
+* STDOUT: Your PID. This is not JSON encoded, just a string. ShellBolt will 
log the PID to its log.
+* STDIN: (JSON) The Storm configuration.  Various settings and properties.
+* STDIN: (JSON) The Topology context
+* The rest happens in a while(true) loop
+* STDIN: A tuple! This is a JSON encoded structure like this:
+
+```
+{
+    // The tuple's id
+       "id": -6955786537413359385,
+       // The id of the component that created this tuple
+       "comp": 1,
+       // The id of the stream this tuple was emitted to
+       "stream": 1,
+       // The id of the task that created this tuple
+       "task": 9,
+       // All the values in this tuple
+       "tuple": ["snow white and the seven dwarfs", "field2", 3]
+}
+```
+
+* STDOUT: The results of your bolt, JSON encoded. This can be a sequence of 
acks, fails, emits, and/or logs. Emits look like:
+
+```
+{
+       "command": "emit",
+       // The ids of the tuples this output tuples should be anchored to
+       "anchors": [1231231, -234234234],
+       // The id of the stream this tuple was emitted to. Leave this empty to 
emit to default stream.
+       "stream": 1,
+       // If doing an emit direct, indicate the task to sent the tuple to
+       "task": 9,
+       // All the values in this tuple
+       "tuple": ["field1", 2, 3]
+}
+```
+
+An ack looks like:
+
+```
+{
+       "command": "ack",
+       // the id of the tuple to ack
+       "id": 123123
+}
+```
+
+A fail looks like:
+
+```
+{
+       "command": "fail",
+       // the id of the tuple to fail
+       "id": 123123
+}
+```
+
+A "log" will log a message in the worker log. It looks like:
+
+```
+{
+       "command": "log",
+       // the message to log
+       "msg": "hello world!"
+
+}
+```
+
+* STDOUT: emit "sync" as a single line by itself when the bolt has finished 
emitting/acking/failing and is ready for the next input
+
+### sync
+
+Note: This command is not JSON encoded, it is sent as a simple string.
+
+This lets the parent bolt know that the script has finished processing and is 
ready for another tuple.

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Structure-of-the-codebase.md
----------------------------------------------------------------------
diff --git a/docs/Structure-of-the-codebase.md 
b/docs/Structure-of-the-codebase.md
new file mode 100644
index 0000000..8ac66f4
--- /dev/null
+++ b/docs/Structure-of-the-codebase.md
@@ -0,0 +1,140 @@
+---
+layout: documentation
+---
+There are three distinct layers to Storm's codebase.
+
+First, Storm was designed from the very beginning to be compatible with 
multiple languages. Nimbus is a Thrift service and topologies are defined as 
Thrift structures. The usage of Thrift allows Storm to be used from any 
language.
+
+Second, all of Storm's interfaces are specified as Java interfaces. So even 
though there's a lot of Clojure in Storm's implementation, all usage must go 
through the Java API. This means that every feature of Storm is always 
available via Java.
+
+Third, Storm's implementation is largely in Clojure. Line-wise, Storm is about 
half Java code, half Clojure code. But Clojure is much more expressive, so in 
reality the great majority of the implementation logic is in Clojure. 
+
+The following sections explain each of these layers in more detail.
+
+### storm.thrift
+
+The first place to look to understand the structure of Storm's codebase is the 
[storm.thrift](https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift)
 file.
+
+Storm uses [this fork](https://github.com/nathanmarz/thrift/tree/storm) of 
Thrift (branch 'storm') to produce the generated code. This "fork" is actually 
Thrift 7 with all the Java packages renamed to be `org.apache.thrift7`. 
Otherwise, it's identical to Thrift 7. This fork was done because of the lack 
of backwards compatibility in Thrift and the need for many people to use other 
versions of Thrift in their Storm topologies.
+
+Every spout or bolt in a topology is given a user-specified identifier called 
the "component id". The component id is used to specify subscriptions from a 
bolt to the output streams of other spouts or bolts. A 
[StormTopology](https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift#L91)
 structure contains a map from component id to component for each type of 
component (spouts and bolts).
+
+Spouts and bolts have the same Thrift definition, so let's just take a look at 
the [Thrift definition for 
bolts](https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift#L79).
 It contains a `ComponentObject` struct and a `ComponentCommon` struct.
+
+The `ComponentObject` defines the implementation for the bolt. It can be one 
of three types:
+
+1. A serialized java object (that implements 
[IBolt](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/task/IBolt.java))
+2. A `ShellComponent` object that indicates the implementation is in another 
language. Specifying a bolt this way will cause Storm to instantiate a 
[ShellBolt](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/task/ShellBolt.java)
 object to handle the communication between the JVM-based worker process and 
the non-JVM-based implementation of the component.
+3. A `JavaObject` structure which tells Storm the classname and constructor 
arguments to use to instantiate that bolt. This is useful if you want to define 
a topology in a non-JVM language. This way, you can make use of JVM-based 
spouts and bolts without having to create and serialize a Java object yourself.
+
+`ComponentCommon` defines everything else for this component. This includes:
+
+1. What streams this component emits and the metadata for each stream (whether 
it's a direct stream, the fields declaration)
+2. What streams this component consumes (specified as a map from 
component_id:stream_id to the stream grouping to use)
+3. The parallelism for this component
+4. The component-specific 
[configuration](https://github.com/apache/incubator-storm/wiki/Configuration) 
for this component
+
+Note that the structure spouts also have a `ComponentCommon` field, and so 
spouts can also have declarations to consume other input streams. Yet the Storm 
Java API does not provide a way for spouts to consume other streams, and if you 
put any input declarations there for a spout you would get an error when you 
tried to submit the topology. The reason that spouts have an input declarations 
field is not for users to use, but for Storm itself to use. Storm adds implicit 
streams and bolts to the topology to set up the [acking 
framework](https://github.com/apache/incubator-storm/wiki/Guaranteeing-message-processing),
 and two of these implicit streams are from the acker bolt to each spout in the 
topology. The acker sends "ack" or "fail" messages along these streams whenever 
a tuple tree is detected to be completed or failed. The code that transforms 
the user's topology into the runtime topology is located 
[here](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/back
 type/storm/daemon/common.clj#L279).
+
+### Java interfaces
+
+The interfaces for Storm are generally specified as Java interfaces. The main 
interfaces are:
+
+1. [IRichBolt](javadocs/backtype/storm/topology/IRichBolt.html)
+2. [IRichSpout](javadocs/backtype/storm/topology/IRichSpout.html)
+3. [TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html)
+
+The strategy for the majority of the interfaces is to:
+
+1. Specify the interface using a Java interface
+2. Provide a base class that provides default implementations when appropriate
+
+You can see this strategy at work with the 
[BaseRichSpout](javadocs/backtype/storm/topology/base/BaseRichSpout.html) 
class. 
+
+Spouts and bolts are serialized into the Thrift definition of the topology as 
described above. 
+
+One subtle aspect of the interfaces is the difference between `IBolt` and 
`ISpout` vs. `IRichBolt` and `IRichSpout`. The main difference between them is 
the addition of the `declareOutputFields` method in the "Rich" versions of the 
interfaces. The reason for the split is that the output fields declaration for 
each output stream needs to be part of the Thrift struct (so it can be 
specified from any language), but as a user you want to be able to declare the 
streams as part of your class. What `TopologyBuilder` does when constructing 
the Thrift representation is call `declareOutputFields` to get the declaration 
and convert it into the Thrift structure. The conversion happens [at this 
portion](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/topology/TopologyBuilder.java#L205)
 of the `TopologyBuilder` code. 
+
+
+### Implementation
+
+Specifying all the functionality via Java interfaces ensures that every 
feature of Storm is available via Java. Moreso, the focus on Java interfaces 
ensures that the user experience from Java-land is pleasant as well.
+
+The implementation of Storm, on the other hand, is primarily in Clojure. While 
the codebase is about 50% Java and 50% Clojure in terms of LOC, most of the 
implementation logic is in Clojure. There are two notable exceptions to this, 
and that is the 
[DRPC](https://github.com/apache/incubator-storm/wiki/Distributed-RPC) and 
[transactional 
topologies](https://github.com/apache/incubator-storm/wiki/Transactional-topologies)
 implementations. These are implemented purely in Java. This was done to serve 
as an illustration for how to implement a higher level abstraction on Storm. 
The DRPC and transactional topologies implementations are in the 
[backtype.storm.coordination](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/coordination),
 
[backtype.storm.drpc](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/drpc),
 and 
[backtype.storm.transactional](https://github.com/apache/incubator-storm/tree/master/storm-core/src
 /jvm/backtype/storm/transactional) packages.
+
+Here's a summary of the purpose of the main Java packages and Clojure 
namespace:
+
+#### Java packages
+
+[backtype.storm.coordination](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/coordination):
 Implements the pieces required to coordinate batch-processing on top of Storm, 
which both DRPC and transactional topologies use. `CoordinatedBolt` is the most 
important class here.
+
+[backtype.storm.drpc](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/drpc):
 Implementation of the DRPC higher level abstraction
+
+[backtype.storm.generated](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/generated):
 The generated Thrift code for Storm (generated using [this 
fork](https://github.com/nathanmarz/thrift) of Thrift, which simply renames the 
packages to org.apache.thrift7 to avoid conflicts with other Thrift versions)
+
+[backtype.storm.grouping](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/grouping):
 Contains interface for making custom stream groupings
+
+[backtype.storm.hooks](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/hooks):
 Interfaces for hooking into various events in Storm, such as when tasks emit 
tuples, when tuples are acked, etc. User guide for hooks is 
[here](https://github.com/apache/incubator-storm/wiki/Hooks).
+
+[backtype.storm.serialization](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/serialization):
 Implementation of how Storm serializes/deserializes tuples. Built on top of 
[Kryo](http://code.google.com/p/kryo/).
+
+[backtype.storm.spout](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/spout):
 Definition of spout and associated interfaces (like the 
`SpoutOutputCollector`). Also contains `ShellSpout` which implements the 
protocol for defining spouts in non-JVM languages.
+
+[backtype.storm.task](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/task):
 Definition of bolt and associated interfaces (like `OutputCollector`). Also 
contains `ShellBolt` which implements the protocol for defining bolts in 
non-JVM languages. Finally, `TopologyContext` is defined here as well, which is 
provided to spouts and bolts so they can get data about the topology and its 
execution at runtime.
+
+[backtype.storm.testing](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/testing):
 Contains a variety of test bolts and utilities used in Storm's unit tests.
+
+[backtype.storm.topology](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/topology):
 Java layer over the underlying Thrift structure to provide a clean, pure-Java 
API to Storm (users don't have to know about Thrift). `TopologyBuilder` is here 
as well as the helpful base classes for the different spouts and bolts. The 
slightly-higher level `IBasicBolt` interface is here, which is a simpler way to 
write certain kinds of bolts.
+
+[backtype.storm.transactional](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/transactional):
 Implementation of transactional topologies.
+
+[backtype.storm.tuple](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/tuple):
 Implementation of Storm's tuple data model.
+
+[backtype.storm.utils](https://github.com/apache/incubator-storm/tree/master/storm-core/src/jvm/backtype/storm/tuple):
 Data structures and miscellaneous utilities used throughout the codebase.
+
+
+#### Clojure namespaces
+
+[backtype.storm.bootstrap](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/bootstrap.clj):
 Contains a helpful macro to import all the classes and namespaces that are 
used throughout the codebase.
+
+[backtype.storm.clojure](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/clojure.clj):
 Implementation of the Clojure DSL for Storm.
+
+[backtype.storm.cluster](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/cluster.clj):
 All Zookeeper logic used in Storm daemons is encapsulated in this file. This 
code manages how cluster state (like what tasks are running where, what 
spout/bolt each task runs as) is mapped to the Zookeeper "filesystem" API.
+
+[backtype.storm.command.*](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/command):
 These namespaces implement various commands for the `storm` command line 
client. These implementations are very short.
+
+[backtype.storm.config](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/config.clj):
 Implementation of config reading/parsing code for Clojure. Also has utility 
functions for determining what local path nimbus/supervisor/daemons should be 
using for various things. e.g. the `master-inbox` function will return the 
local path that Nimbus should use when jars are uploaded to it.
+
+[backtype.storm.daemon.acker](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/acker.clj):
 Implementation of the "acker" bolt, which is a key part of how Storm 
guarantees data processing.
+
+[backtype.storm.daemon.common](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/common.clj):
 Implementation of common functions used in Storm daemons, like getting the id 
for a topology based on the name, mapping a user's topology into the one that 
actually executes (with implicit acking streams and acker bolt added - see 
`system-topology!` function), and definitions for the various heartbeat and 
other structures persisted by Storm.
+
+[backtype.storm.daemon.drpc](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/drpc.clj):
 Implementation of the DRPC server for use with DRPC topologies.
+
+[backtype.storm.daemon.nimbus](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/nimbus.clj):
 Implementation of Nimbus.
+
+[backtype.storm.daemon.supervisor](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj):
 Implementation of Supervisor.
+
+[backtype.storm.daemon.task](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/task.clj):
 Implementation of an individual task for a spout or bolt. Handles message 
routing, serialization, stats collection for the UI, as well as the 
spout-specific and bolt-specific execution implementations.
+
+[backtype.storm.daemon.worker](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/worker.clj):
 Implementation of a worker process (which will contain many tasks within). 
Implements message transferring and task launching.
+
+[backtype.storm.event](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/event.clj):
 Implements a simple asynchronous function executor. Used in various places in 
Nimbus and Supervisor to make functions execute in serial to avoid any race 
conditions.
+
+[backtype.storm.log](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/log.clj):
 Defines the functions used to log messages to log4j.
+
+[backtype.storm.messaging.*](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/messaging):
 Defines a higher level interface to implementing point to point messaging. In 
local mode Storm uses in-memory Java queues to do this; on a cluster, it uses 
ZeroMQ. The generic interface is defined in protocol.clj.
+
+[backtype.storm.stats](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/stats.clj):
 Implementation of stats rollup routines used when sending stats to ZK for use 
by the UI. Does things like windowed and rolling aggregations at multiple 
granularities.
+
+[backtype.storm.testing](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/testing.clj):
 Implementation of facilities used to test Storm topologies. Includes time 
simulation, `complete-topology` for running a fixed set of tuples through a 
topology and capturing the output, tracker topologies for having fine grained 
control over detecting when a cluster is "idle", and other utilities.
+
+[backtype.storm.thrift](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/thrift.clj):
 Clojure wrappers around the generated Thrift API to make working with Thrift 
structures more pleasant.
+
+[backtype.storm.timer](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/timer.clj):
 Implementation of a background timer to execute functions in the future or on 
a recurring interval. Storm couldn't use the 
[Timer](http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Timer.html) 
class because it needed integration with time simulation in order to be able to 
unit test Nimbus and the Supervisor.
+
+[backtype.storm.ui.*](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/ui):
 Implementation of Storm UI. Completely independent from rest of code base and 
uses the Nimbus Thrift API to get data.
+
+[backtype.storm.util](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/util.clj):
 Contains generic utility functions used throughout the code base.
+ 
+[backtype.storm.zookeeper](https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/zookeeper.clj):
 Clojure wrapper around the Zookeeper API and implements some "high-level" 
stuff like "mkdirs" and "delete-recursive".

http://git-wip-us.apache.org/repos/asf/storm/blob/abbfb973/docs/Support-for-non-java-languages.md
----------------------------------------------------------------------
diff --git a/docs/Support-for-non-java-languages.md 
b/docs/Support-for-non-java-languages.md
new file mode 100644
index 0000000..724d106
--- /dev/null
+++ b/docs/Support-for-non-java-languages.md
@@ -0,0 +1,7 @@
+---
+layout: documentation
+---
+* [Scala DSL](https://github.com/velvia/ScalaStorm)
+* [JRuby DSL](https://github.com/colinsurprenant/storm-jruby)
+* [Clojure DSL](Clojure-DSL.html)
+* [io-storm](https://github.com/gphat/io-storm): Perl multilang adapter

[08/32] storm git commit: STORM-1617: Release Specific Documentation 0.9.x

Reply via email to