Author: bobby
Date: Thu Mar 17 22:05:29 2016
New Revision: 1735510
URL: http://svn.apache.org/viewvc?rev=1735510&view=rev
Log:
Updated some of 0.10.0 to match what is in asf_site in git
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md
storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md
storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md
storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md
storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md
storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md
storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md
storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md
storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md
storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md
storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md
storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md
storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md
storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md
storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md
storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md
storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md
storm/branches/bobby-versioned-site/releases/0.10.0/images/topology.png
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md
Thu Mar 17 22:05:29 2016
@@ -4,7 +4,7 @@ layout: documentation
documentation: true
version: v0.10.0
---
-This page describes all the commands that are possible with the "storm"
command line client. To learn how to set up your "storm" client to talk to a
remote cluster, follow the instructions in [Setting up development
environment](Setting-up-a-development-environment.html).
+This page describes all the commands that are possible with the "storm"
command line client. To learn how to set up your "storm" client to talk to a
remote cluster, follow the instructions in [Setting up development
environment](Setting-up-development-environment.html).
These commands are:
@@ -48,12 +48,14 @@ Deactivates the specified topology's spo
### rebalance
-Syntax: `storm rebalance topology-name [-w wait-time-secs]`
+Syntax: `storm rebalance topology-name [-w wait-time-secs] [-n
new-num-workers] [-e component=parallelism]*`
Sometimes you may wish to spread out where the workers for a topology are
running. For example, let's say you have a 10 node cluster running 4 workers
per node, and then let's say you add another 10 nodes to the cluster. You may
wish to have Storm spread out the workers for the running topology so that each
node runs 2 workers. One way to do this is to kill the topology and resubmit
it, but Storm provides a "rebalance" command that provides an easier way to do
this.
Rebalance will first deactivate the topology for the duration of the message
timeout (overridable with the -w flag) and then redistribute the workers evenly
around the cluster. The topology will then return to its previous state of
activation (so a deactivated topology will still be deactivated and an
activated topology will go back to being activated).
+The rebalance command can also be used to change the parallelism of a running
topology. Use the -n and -e switches to change the number of workers or number
of executors of a component respectively.
+
### repl
Syntax: `storm repl`
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md (original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md Thu Mar 17
22:05:29 2016
@@ -79,7 +79,7 @@ Its perfectly fine to launch new threads
Part of defining a topology is specifying for each bolt which streams it
should receive as input. A stream grouping defines how that stream should be
partitioned among the bolt's tasks.
-There are seven built-in stream groupings in Storm, and you can implement a
custom stream grouping by implementing the
[CustomStreamGrouping](javadocs/backtype/storm/grouping/CustomStreamGrouping.html)
interface:
+There are eight built-in stream groupings in Storm, and you can implement a
custom stream grouping by implementing the
[CustomStreamGrouping](javadocs/backtype/storm/grouping/CustomStreamGrouping.html)
interface:
1. **Shuffle grouping**: Tuples are randomly distributed across the bolt's
tasks in a way such that each bolt is guaranteed to get an equal number of
tuples.
2. **Fields grouping**: The stream is partitioned by the fields specified in
the grouping. For example, if the stream is grouped by the "user-id" field,
tuples with the same "user-id" will always go to the same task, but tuples with
different "user-id"'s may go to different tasks.
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md Thu
Mar 17 22:05:29 2016
@@ -6,7 +6,7 @@ version: v0.10.0
---
Storm has a variety of configurations for tweaking the behavior of nimbus,
supervisors, and running topologies. Some configurations are system
configurations and cannot be modified on a topology by topology basis, whereas
other configurations can be modified per topology.
-Every configuration has a default value defined in
[defaults.yaml](https://github.com/apache/storm/{{page.version}}/master/conf/defaults.yaml)
in the Storm codebase. You can override these configurations by defining a
storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can
define a topology-specific configuration that you submit along with your
topology when using
[StormSubmitter](javadocs/backtype/storm/StormSubmitter.html). However, the
topology-specific configuration can only override configs prefixed with
"TOPOLOGY".
+Every configuration has a default value defined in
[defaults.yaml](https://github.com/apache/storm/blob/{{page.version}}/conf/defaults.yaml)
in the Storm codebase. You can override these configurations by defining a
storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can
define a topology-specific configuration that you submit along with your
topology when using
[StormSubmitter](javadocs/backtype/storm/StormSubmitter.html). However, the
topology-specific configuration can only override configs prefixed with
"TOPOLOGY".
Storm 0.7.0 and onwards lets you override configuration on a
per-bolt/per-spout basis. The only configurations that can be overriden this
way are:
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md (original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md Thu Mar 17
22:05:29 2016
@@ -29,7 +29,7 @@ version: v0.10.0
### Halp! I cannot see:
-* **my logs** Logs by default go to $STORM_HOME/logs. Check that you have
write permissions to that directory. They are configured in the
logback/cluster.xml (0.9) and log4j/*.properties in earlier versions.
+* **my logs** Logs by default go to $STORM_HOME/logs. Check that you have
write permissions to that directory. They are configured in log4j2/{cluster,
worker}.xml.
* **final JVM settings** Add the `-XX+PrintFlagsFinal` commandline option in
the childopts (see the conf file)
* **final Java system properties** Add `Properties props =
System.getProperties(); props.list(System.out);` near where you build your
topology.
@@ -63,6 +63,10 @@ You can join streams with join, merge or
At time of writing, you can't emit to multiple output streams from Trident --
see [STORM-68](https://issues.apache.org/jira/browse/STORM-68)
+### Why am I getting a NotSerializableException/IllegalStateException when my
topology is being started up?
+
+Within the Storm lifecycle, the topology is instantiated and then serialized
to byte format to be stored in ZooKeeper, prior to the topology being executed.
Within this step, if a spout or bolt within the topology has an initialized
unserializable property, serialization will fail. If there is a need for a
field that is unserializable, initialize it within the bolt or spout's prepare
method, which is run after the topology is delivered to the worker.
+
## Spouts
### What is a coordinator, and why are there several?
@@ -110,7 +114,7 @@ You can't change the overall batch size
### How do I aggregate events by time?
-If have records with an immutable timestamp, and you would like to count,
average or otherwise aggregate them into discrete time buckets, Trident is an
excellent and scalable solution.
+If you have records with an immutable timestamp, and you would like to count,
average or otherwise aggregate them into discrete time buckets, Trident is an
excellent and scalable solution.
Write an `Each` function that turns the timestamp into a time bucket: if the
bucket size was "by hour", then the timestamp `2013-08-08 12:34:56` would be
mapped to the `2013-08-08 12:00:00` time bucket, and so would everything else
in the twelve o'clock hour. Then group on that timebucket and use a grouped
persistentAggregate. The persistentAggregate uses a local cacheMap backed by a
data store. Groups with many records require very few reads from the data
store, and use efficient bulk reads and writes; as long as your data feed is
relatively prompt Trident will make very efficient use of memory and network.
Even if a server drops off line for a day, then delivers that full day's worth
of data in a rush, the old results will be calmly retrieved and updated -- and
without interfering with calculating the current results.
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md (original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md Thu Mar 17
22:05:29 2016
@@ -6,5 +6,5 @@ version: v0.10.0
---
Storm provides hooks with which you can insert custom code to run on any
number of events within Storm. You create a hook by extending the
[BaseTaskHook](javadocs/backtype/storm/hooks/BaseTaskHook.html) class and
overriding the appropriate method for the event you want to catch. There are
two ways to register your hook:
-1. In the open method of your spout or prepare method of your bolt using the
[TopologyContext#addTaskHook](javadocs/backtype/storm/task/TopologyContext.html)
method.
+1. In the open method of your spout or prepare method of your bolt using the
[TopologyContext](javadocs/backtype/storm/task/TopologyContext.html#addTaskHook)
method.
2. Through the Storm configuration using the
["topology.auto.task.hooks"](javadocs/backtype/storm/Config.html#TOPOLOGY_AUTO_TASK_HOOKS)
config. These hooks are automatically registered in every spout or bolt, and
are useful for doing things like integrating with a custom monitoring system.
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md
Thu Mar 17 22:05:29 2016
@@ -8,7 +8,7 @@ This page explains how to use to Storm t
## Preliminaries
### Storm
-This tutorial uses examples from the
[storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project and the
[storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter)
project. It's recommended that you clone those projects and follow along with
the examples. Read [Setting up development
environment](https://github.com/apache/storm/wiki/Setting-up-development-environment)
and [Creating a new Storm
project](https://github.com/apache/storm/wiki/Creating-a-new-Storm-project) to
get your machine set up.
+This tutorial uses examples from the
[storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project and the
[storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter)
project. It's recommended that you clone those projects and follow along with
the examples. Read [Setting up development
environment](Setting-up-development-environment.html) and [Creating a new Storm
project](Creating-a-new-Storm-project.html) to get your machine set up.
### Kestrel
It assumes you are able to run locally a Kestrel server as described
[here](https://github.com/nathanmarz/storm-kestrel).
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md (original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md Thu Mar 17
22:05:29 2016
@@ -1,5 +1,7 @@
---
+title: Maven
layout: documentation
+documentation: true
version: v0.10.0
raw-version: 0.10.0
---
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md
(original)
+++
storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md
Thu Mar 17 22:05:29 2016
@@ -9,23 +9,23 @@ version: v0.10.0
This page walks through how emitting and transferring tuples works in Storm.
- Worker is responsible for message transfer
- - `refresh-connections` is called every "task.refresh.poll.secs" or
whenever assignment in ZK changes. It manages connections to other workers and
maintains a mapping from task -> worker
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
- - Provides a "transfer function" that is used by tasks to send tuples to
other tasks. The transfer function takes in a task id and a tuple, and it
serializes the tuple and puts it onto a "transfer queue". There is a single
transfer queue for each worker.
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L56)
- - The serializer is thread-safe
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/jvm/backtype/storm/serialization/KryoTupleSerializer.java#L26)
- - The worker has a single thread which drains the transfer queue and sends
the messages to other workers
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L185)
- - Message sending happens through this protocol:
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/protocol.clj)
- - The implementation for distributed mode uses ZeroMQ
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/zmq.clj)
- - The implementation for local mode uses in memory Java queues (so that
it's easy to use Storm locally without needing to get ZeroMQ installed)
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj)
+ - `refresh-connections` is called every "task.refresh.poll.secs" or
whenever assignment in ZK changes. It manages connections to other workers and
maintains a mapping from task -> worker
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
+ - Provides a "transfer function" that is used by tasks to send tuples to
other tasks. The transfer function takes in a task id and a tuple, and it
serializes the tuple and puts it onto a "transfer queue". There is a single
transfer queue for each worker.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L56)
+ - The serializer is thread-safe
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/serialization/KryoTupleSerializer.java#L26)
+ - The worker has a single thread which drains the transfer queue and sends
the messages to other workers
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L185)
+ - Message sending happens through this protocol:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/protocol.clj)
+ - The implementation for distributed mode uses ZeroMQ
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/zmq.clj)
+ - The implementation for local mode uses in memory Java queues (so that
it's easy to use Storm locally without needing to get ZeroMQ installed)
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj)
- Receiving messages in tasks works differently in local mode and distributed
mode
- - In local mode, the tuple is sent directly to an in-memory queue for the
receiving task
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/messaging/local.clj#L21)
- - In distributed mode, each worker listens on a single TCP port for
incoming messages and then routes those messages in-memory to tasks. The TCP
port is called a "virtual port", because it receives [task id, message] and
then routes it to the actual task.
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/worker.clj#L204)
- - The virtual port implementation is here:
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/zilch/virtual_port.clj)
- - Tasks listen on an in-memory ZeroMQ port for messages from the virtual
port
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L201)
- - Bolts listen here:
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L489)
- - Spouts listen here:
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L382)
+ - In local mode, the tuple is sent directly to an in-memory queue for the
receiving task
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj#L21)
+ - In distributed mode, each worker listens on a single TCP port for
incoming messages and then routes those messages in-memory to tasks. The TCP
port is called a "virtual port", because it receives [task id, message] and
then routes it to the actual task.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L204)
+ - The virtual port implementation is here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/zilch/virtual_port.clj)
+ - Tasks listen on an in-memory ZeroMQ port for messages from the virtual
port
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L201)
+ - Bolts listen here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L489)
+ - Spouts listen here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L382)
- Tasks are responsible for message routing. A tuple is emitted either to a
direct stream (where the task id is specified) or a regular stream. In direct
streams, the message is only sent if that bolt subscribes to that direct
stream. In regular streams, the stream grouping functions are used to determine
the task ids to send the tuple to.
- - Tasks have a routing map from {stream id} -> {component id} -> {stream
grouping function}
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L198)
- - The "tasks-fn" returns the task ids to send the tuples to for either
regular stream emit or direct stream emit
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L207)
+ - Tasks have a routing map from {stream id} -> {component id} -> {stream
grouping function}
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L198)
+ - The "tasks-fn" returns the task ids to send the tuples to for either
regular stream emit or direct stream emit
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L207)
- After getting the output task ids, bolts and spouts use the transfer-fn
provided by the worker to actually transfer the tuples
- - Bolt transfer code here:
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L429)
- - Spout transfer code here:
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L329)
+ - Bolt transfer code here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L429)
+ - Spout transfer code here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L329)
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md
(original)
+++
storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md
Thu Mar 17 22:05:29 2016
@@ -51,7 +51,7 @@ You can find out how to configure your `
There are a variety of configurations you can set per topology. A list of all
the configurations you can set can be found
[here](javadocs/backtype/storm/Config.html). The ones prefixed with "TOPOLOGY"
can be overridden on a topology-specific basis (the other ones are cluster
configurations and cannot be overridden). Here are some common ones that are
set for a topology:
1. **Config.TOPOLOGY_WORKERS**: This sets the number of worker processes to
use to execute the topology. For example, if you set this to 25, there will be
25 Java processes across the cluster executing all the tasks. If you had a
combined 150 parallelism across all components in the topology, each worker
process will have 6 tasks running within it as threads.
-2. **Config.TOPOLOGY_ACKERS**: This sets the number of tasks that will track
tuple trees and detect when a spout tuple has been fully processed. Ackers are
an integral part of Storm's reliability model and you can read more about them
on [Guaranteeing message processing](Guaranteeing-message-processing.html).
+2. **Config.TOPOLOGY_ACKER_EXECUTORS**: This sets the number of executors that
will track tuple trees and detect when a spout tuple has been fully processed.
Ackers are an integral part of Storm's reliability model and you can read more
about them on [Guaranteeing message
processing](Guaranteeing-message-processing.html). By not setting this variable
or setting it as null, Storm will set the number of acker executors to be equal
to the number of workers configured for this topology. If this variable is set
to 0, then Storm will immediately ack tuples as soon as they come off the
spout, effectively disabling reliability.
3. **Config.TOPOLOGY_MAX_SPOUT_PENDING**: This sets the maximum number of
spout tuples that can be pending on a single spout task at once (pending means
the tuple has not been acked or failed yet). It is highly recommended you set
this config to prevent queue explosion.
4. **Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS**: This is the maximum amount of
time a spout tuple has to be fully completed before it is considered failed.
This value defaults to 30 seconds, which is sufficient for most topologies. See
[Guaranteeing message processing](Guaranteeing-message-processing.html) for
more information on how Storm's reliability model works.
5. **Config.TOPOLOGY_SERIALIZATIONS**: You can register more serializers to
Storm using this config so that you can use custom types within tuples.
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md Thu
Mar 17 22:05:29 2016
@@ -8,7 +8,7 @@ This page is about how the serialization
Tuples can be comprised of objects of any types. Since Storm is a distributed
system, it needs to know how to serialize and deserialize objects when they're
passed between tasks.
-Storm uses [Kryo](http://code.google.com/p/kryo/) for serialization. Kryo is a
flexible and fast serialization library that produces small serializations.
+Storm uses [Kryo](https://github.com/EsotericSoftware/kryo) for serialization.
Kryo is a flexible and fast serialization library that produces small
serializations.
By default, Storm can serialize primitive types, strings, byte arrays,
ArrayList, HashMap, HashSet, and the Clojure collection types. If you want to
use another type in your tuples, you'll need to register a custom serializer.
@@ -24,12 +24,12 @@ Finally, another reason for using dynami
### Custom serialization
-As mentioned, Storm uses Kryo for serialization. To implement custom
serializers, you need to register new serializers with Kryo. It's highly
recommended that you read over [Kryo's home
page](http://code.google.com/p/kryo/) to understand how it handles custom
serialization.
+As mentioned, Storm uses Kryo for serialization. To implement custom
serializers, you need to register new serializers with Kryo. It's highly
recommended that you read over [Kryo's home
page](https://github.com/EsotericSoftware/kryo) to understand how it handles
custom serialization.
Adding custom serializers is done through the "topology.kryo.register"
property in your topology config. It takes a list of registrations, where each
registration can take one of two forms:
1. The name of a class to register. In this case, Storm will use Kryo's
`FieldsSerializer` to serialize the class. This may or may not be optimal for
the class -- see the Kryo docs for more details.
-2. A map from the name of a class to register to an implementation of
[com.esotericsoftware.kryo.Serializer](http://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/Serializer.java).
+2. A map from the name of a class to register to an implementation of
[com.esotericsoftware.kryo.Serializer](https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/Serializer.java).
Let's look at an example.
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md
(original)
+++
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md
Thu Mar 17 22:05:29 2016
@@ -65,11 +65,13 @@ storm.local.dir: "C:\\storm-local"
```
If you use a relative path,it will be relative to where you installed
storm(STORM_HOME).
You can leave it empty with default value `$STORM_HOME/storm-local`
-3) **nimbus.host**: The worker nodes need to know which machine is the master
in order to download topology jars and confs. For example:
+
+3) **nimbus.seeds**: The worker nodes need to know which machines are the
candidate of master in order to download topology jars and confs. For example:
```yaml
-nimbus.host: "111.222.333.44"
+nimbus.seeds: ["111.222.333.44"]
```
+You're encouraged to fill out the value to list of **machine's FQDN**. If you
want to set up Nimbus H/A, you have to address all machines' FQDN which run
nimbus. You may want to leave it to default value when you just want to set up
'pseudo-distributed' cluster, but you're still encouraged to fill out FQDN.
4) **supervisor.slots.ports**: For each worker machine, you configure how many
workers run on that machine with this config. Each worker uses a single port
for receiving messages, and this setting defines which ports are open for use.
If you define five ports here, then Storm will allocate up to five workers to
run on this machine. If you define three ports, Storm will only run up to
three. By default, this setting is configured to run 4 workers on the ports
6700, 6701, 6702, and 6703. For example:
@@ -81,6 +83,25 @@ supervisor.slots.ports:
- 6703
```
+### Monitoring Health of Supervisors
+
+Storm provides a mechanism by which administrators can configure the
supervisor to run administrator supplied scripts periodically to determine if a
node is healthy or not. Administrators can have the supervisor determine if the
node is in a healthy state by performing any checks of their choice in scripts
located in storm.health.check.dir. If a script detects the node to be in an
unhealthy state, it must print a line to standard output beginning with the
string ERROR. The supervisor will periodically run the scripts in the health
check dir and check the output. If the scriptâs output contains the string
ERROR, as described above, the supervisor will shut down any workers and exit.
+
+If the supervisor is running with supervision "/bin/storm node-health-check"
can be called to determine if the supervisor should be launched or if the node
is unhealthy.
+
+The health check directory location can be configured with:
+
+```yaml
+storm.health.check.dir: "healthchecks"
+
+```
+The scripts must have execute permissions.
+The time to allow any given healthcheck script to run before it is marked
failed due to timeout can be configured with:
+
+```yaml
+storm.health.check.timeout.ms: 5000
+```
+
### Configure external libraries and environmental variables (optional)
If you need support from external libraries or custom plugins, you can place
such jars into the extlib/ and extlib-daemon/ directories. Note that the
extlib-daemon/ directory stores jars used only by daemons (Nimbus, Supervisor,
DRPC, UI, Logviewer), e.g., HDFS and customized scheduling libraries.
Accordingly, two environmental variables STORM_EXT_CLASSPATH and
STORM_EXT_CLASSPATH_DAEMON can be configured by users for including the
external classpath and daemon-only external classpath.
@@ -92,6 +113,6 @@ The last step is to launch all the Storm
1. **Nimbus**: Run the command "bin/storm nimbus" under supervision on the
master machine.
2. **Supervisor**: Run the command "bin/storm supervisor" under supervision on
each worker machine. The supervisor daemon is responsible for starting and
stopping worker processes on that machine.
-3. **UI**: Run the Storm UI (a site you can access from the browser that gives
diagnostics on the cluster and topologies) by running the command "bin/storm
ui" under supervision. The UI can be accessed by navigating your web browser to
http://{nimbus host}:8080.
+3. **UI**: Run the Storm UI (a site you can access from the browser that gives
diagnostics on the cluster and topologies) by running the command "bin/storm
ui" under supervision. The UI can be accessed by navigating your web browser to
http://{ui host}:8080.
As you can see, running the daemons is very straightforward. The daemons will
log to the logs/ directory in wherever you extracted the Storm release.
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md
(original)
+++
storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md
Thu Mar 17 22:05:29 2016
@@ -30,13 +30,5 @@ Installing a Storm release locally is on
The previous step installed the `storm` client on your machine which is used
to communicate with remote Storm clusters. Now all you have to do is tell the
client which Storm cluster to talk to. To do this, all you have to do is put
the host address of the master in the `~/.storm/storm.yaml` file. It should
look something like this:
```
-nimbus.host: "123.45.678.890"
+nimbus.seeds: ["123.45.678.890"]
```
-
-Alternatively, if you use the
[storm-deploy](https://github.com/nathanmarz/storm-deploy) project to provision
Storm clusters on AWS, it will automatically set up your ~/.storm/storm.yaml
file. You can manually attach to a Storm cluster (or switch between multiple
clusters) using the "attach" command, like so:
-
-```
-lein run :deploy --attach --name mystormcluster
-```
-
-More information is on the storm-deploy
[wiki](https://github.com/nathanmarz/storm-deploy/wiki)
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md
Thu Mar 17 22:05:29 2016
@@ -78,15 +78,228 @@ Now suppose you had these tuples with fi
If you ran this code:
```java
-mystream.each(new Fields("b", "a"), new MyFilter())
+mystream.filter(new MyFilter())
```
The resulting tuples would be:
```
-[2, 1, 1]
+[1, 2, 3]
```
+### map and flatMap
+
+`map` returns a stream consisting of the result of applying the given mapping
function to the tuples of the stream. This
+can be used to apply a one-one transformation to the tuples.
+
+For example, if there is a stream of words and you wanted to convert it to a
stream of upper case words,
+you could define a mapping function as follows,
+
+```java
+public class UpperCase extends MapFunction {
+ @Override
+ public Values execute(TridentTuple input) {
+ return new Values(input.getString(0).toUpperCase());
+ }
+}
+```
+
+The mapping function can then be applied on the stream to produce a stream of
uppercase words.
+
+```java
+mystream.map(new UpperCase())
+```
+
+`flatMap` is similar to `map` but has the effect of applying a one-to-many
transformation to the values of the stream,
+and then flattening the resulting elements into a new stream.
+
+For example, if there is a stream of sentences and you wanted to convert it to
a stream of words,
+you could define a flatMap function as follows,
+
+```java
+public class Split extends FlatMapFunction {
+ @Override
+ public Iterable<Values> execute(TridentTuple input) {
+ List<Values> valuesList = new ArrayList<>();
+ for (String word : input.getString(0).split(" ")) {
+ valuesList.add(new Values(word));
+ }
+ return valuesList;
+ }
+}
+```
+
+The flatMap function can then be applied on the stream of sentences to produce
a stream of words,
+
+```java
+mystream.flatMap(new Split())
+```
+
+Of course these operations can be chained, so a stream of uppercase words can
be obtained from a stream of sentences as follows,
+
+```java
+mystream.flatMap(new Split()).map(new UpperCase())
+```
+### peek
+`peek` can be used to perform an additional action on each trident tuple as
they flow through the stream.
+ This could be useful for debugging to see the tuples as they flow past a
certain point in a pipeline.
+
+For example, the below code would print the result of converting the words to
uppercase before they are passed to `groupBy`
+```java
+ mystream.flatMap(new Split()).map(new UpperCase())
+ .peek(new Consumer() {
+ @Override
+ public void accept(TridentTuple input) {
+ System.out.println(input.getString(0));
+ }
+ })
+ .groupBy(new Fields("word"))
+ .persistentAggregate(new MemoryMapState.Factory(), new Count(), new
Fields("count"))
+```
+
+### min and minBy
+`min` and `minBy` operations return minimum value on each partition of a batch
of tuples in a trident stream.
+
+Suppose, a trident stream contains fields ["device-id", "count"] and the
following partitions of tuples
+
+```
+Partition 0:
+[123, 2]
+[113, 54]
+[23, 28]
+[237, 37]
+[12, 23]
+[62, 17]
+[98, 42]
+
+Partition 1:
+[64, 18]
+[72, 54]
+[2, 28]
+[742, 71]
+[98, 45]
+[62, 12]
+[19, 174]
+
+
+Partition 2:
+[27, 94]
+[82, 23]
+[9, 86]
+[53, 71]
+[74, 37]
+[51, 49]
+[37, 98]
+
+```
+
+`minBy` operation can be applied on the above stream of tuples like below
which results in emitting tuples with minimum values of `count` field in each
partition.
+
+``` java
+ mystream.minBy(new Fields("count"))
+```
+Result of the above code on mentioned partitions is:
+
+```
+Partition 0:
+[123, 2]
+
+
+Partition 1:
+[62, 12]
+
+
+Partition 2:
+[82, 23]
+
+```
+
+You can look at other `min` and `minBy` operations on Stream
+``` java
+ public <T> Stream minBy(String inputFieldName, Comparator<T> comparator)
+ public Stream min(Comparator<TridentTuple> comparator)
+```
+Below example shows how these APIs can be used to find minimum using
respective Comparators on a tuple.
+
+``` java
+
+ FixedBatchSpout spout = new FixedBatchSpout(allFields, 10,
Vehicle.generateVehicles(20));
+
+ TridentTopology topology = new TridentTopology();
+ Stream vehiclesStream = topology.newStream("spout1", spout).
+ each(allFields, new Debug("##### vehicles"));
+
+ Stream slowVehiclesStream =
+ vehiclesStream
+ .min(new SpeedComparator()) // Comparator w.r.t speed
on received tuple.
+ .each(vehicleField, new Debug("#### slowest vehicle"));
+
+ vehiclesStream
+ .minBy(Vehicle.FIELD_NAME, new EfficiencyComparator()) //
Comparator w.r.t efficiency on received tuple.
+ .each(vehicleField, new Debug("#### least efficient vehicle"));
+
+```
+Example applications of these APIs can be located at
[TridentMinMaxOfDevicesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfDevicesTopology.java)
and
[TridentMinMaxOfVehiclesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfVehiclesTopology.java)
+
+### max and maxBy
+`max` and `maxBy` operations return maximum value on each partition of a batch
of tuples in a trident stream.
+
+Suppose, a trident stream contains fields ["device-id", "count"] as mentioned
in the above section.
+
+`max` and `maxBy` operations can be applied on the above stream of tuples like
below which results in emitting tuples with maximum values of `count` field for
each partition.
+
+``` java
+ mystream.maxBy(new Fields("count"))
+```
+Result of the above code on mentioned partitions is:
+
+```
+Partition 0:
+[113, 54]
+
+
+Partition 1:
+[19, 174]
+
+
+Partition 2:
+[37, 98]
+
+```
+
+You can look at other `max` and `maxBy` functions on Stream
+
+``` java
+
+ public <T> Stream maxBy(String inputFieldName, Comparator<T> comparator)
+ public Stream max(Comparator<TridentTuple> comparator)
+
+```
+
+Below example shows how these APIs can be used to find maximum using
respective Comparators on a tuple.
+
+``` java
+
+ FixedBatchSpout spout = new FixedBatchSpout(allFields, 10,
Vehicle.generateVehicles(20));
+
+ TridentTopology topology = new TridentTopology();
+ Stream vehiclesStream = topology.newStream("spout1", spout).
+ each(allFields, new Debug("##### vehicles"));
+
+ vehiclesStream
+ .max(new SpeedComparator()) // Comparator w.r.t speed on
received tuple.
+ .each(vehicleField, new Debug("#### fastest vehicle"))
+ .project(driverField)
+ .each(driverField, new Debug("##### fastest driver"));
+
+ vehiclesStream
+ .maxBy(Vehicle.FIELD_NAME, new EfficiencyComparator()) //
Comparator w.r.t efficiency on received tuple.
+ .each(vehicleField, new Debug("#### most efficient vehicle"));
+
+```
+
+Example applications of these APIs can be located at
[TridentMinMaxOfDevicesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfDevicesTopology.java)
and
[TridentMinMaxOfVehiclesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfVehiclesTopology.java)
+
### partitionAggregate
partitionAggregate runs a function on each partition of a batch of tuples.
Unlike functions, the tuples emitted by partitionAggregate replace the input
tuples given to it. Consider this example:
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md Thu
Mar 17 22:05:29 2016
@@ -329,4 +329,4 @@ Trident also provides the [CachedMap](ht
Finally, Trident provides the
[SnapshottableMap](https://github.com/apache/storm/blob/{{page.version}}/storm-core/src/jvm/storm/trident/state/map/SnapshottableMap.java)
class that turns a MapState into a Snapshottable object, by storing global
aggregations into a fixed key.
-Take a look at the implementation of
[MemcachedState](https://github.com/nathanmarz/trident-memcached/blob/{{page.version}}/src/jvm/trident/memcached/MemcachedState.java)
to see how all these utilities can be put together to make a high performance
MapState implementation. MemcachedState allows you to choose between opaque
transactional, transactional, and non-transactional semantics.
+Take a look at the implementation of
[MemcachedState](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java)
to see how all these utilities can be put together to make a high performance
MapState implementation. MemcachedState allows you to choose between opaque
transactional, transactional, and non-transactional semantics.
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md Thu
Mar 17 22:05:29 2016
@@ -236,7 +236,7 @@ Trident solves this problem by doing two
With these two primitives, you can achieve exactly-once semantics with your
state updates. Rather than store just the count in the database, what you can
do instead is store the transaction id with the count in the database as an
atomic value. Then, when updating the count, you can just compare the
transaction id in the database with the transaction id for the current batch.
If they're the same, you skip the update â because of the strong ordering,
you know for sure that the value in the database incorporates the current
batch. If they're different, you increment the count.
-Of course, you don't have to do this logic manually in your topologies. This
logic is wrapped by the State abstraction and done automatically. Nor is your
State object required to implement the transaction id trick: if you don't want
to pay the cost of storing the transaction id in the database, you don't have
to. In that case the State will have at-least-once-processing semantics in the
case of failures (which may be fine for your application). You can read more
about how to implement a State and the various fault-tolerance tradeoffs
possible [in this doc](/documentation/Trident-state).
+Of course, you don't have to do this logic manually in your topologies. This
logic is wrapped by the State abstraction and done automatically. Nor is your
State object required to implement the transaction id trick: if you don't want
to pay the cost of storing the transaction id in the database, you don't have
to. In that case the State will have at-least-once-processing semantics in the
case of failures (which may be fine for your application). You can read more
about how to implement a State and the various fault-tolerance tradeoffs
possible [in this doc](/documentation/Trident-state.html).
A State is allowed to use whatever strategy it wants to store state. So it
could store state in an external database or it could keep the state in-memory
but backed by HDFS (like how HBase works). State's are not required to hold
onto state forever. For example, you could have an in-memory State
implementation that only keeps the last X hours of data available and drops
anything older. Take a look at the implementation of the [Memcached
integration](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java)
for an example State implementation.
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md
(original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md Thu
Mar 17 22:05:29 2016
@@ -141,6 +141,43 @@ Caused by: java.lang.NullPointerExceptio
... 6 more
```
+or
+
+```
+java.lang.RuntimeException: java.lang.NullPointerException
+ at
+backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
+~[storm-core-0.9.3.jar:0.9.3]
+ at
+backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
+~[storm-core-0.9.3.jar:0.9.3]
+ at
+backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
+~[storm-core-0.9.3.jar:0.9.3]
+ at
+backtype.storm.disruptor$consume_loop_STAR_$fn__759.invoke(disruptor.clj:94)
+~[storm-core-0.9.3.jar:0.9.3]
+ at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463)
+~[storm-core-0.9.3.jar:0.9.3]
+ at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
+ at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
+Caused by: java.lang.NullPointerException: null
+ at clojure.lang.RT.intCast(RT.java:1087) ~[clojure-1.5.1.jar:na]
+ at
+backtype.storm.daemon.worker$mk_transfer_fn$fn__3548.invoke(worker.clj:129)
+~[storm-core-0.9.3.jar:0.9.3]
+ at
+backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3282.invoke(executor.clj:258)
+~[storm-core-0.9.3.jar:0.9.3]
+ at
+backtype.storm.disruptor$clojure_handler$reify__746.onEvent(disruptor.clj:58)
+~[storm-core-0.9.3.jar:0.9.3]
+ at
+backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
+~[storm-core-0.9.3.jar:0.9.3]
+ ... 6 common frames omitted
+```
+
Solution:
* This is caused by having multiple threads issue methods on the
`OutputCollector`. All emits, acks, and fails must happen on the same thread.
One subtle way this can happen is if you make a `IBasicBolt` that emits on a
separate thread. `IBasicBolt`'s automatically ack after execute is called, so
this would cause multiple threads to use the `OutputCollector` leading to this
exception. When using a basic bolt, all emits must happen in the same thread
that runs `execute`.
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md (original)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md Thu Mar 17
22:05:29 2016
@@ -140,23 +140,28 @@ As you can see, the implementation is ve
public static class ExclamationBolt implements IRichBolt {
OutputCollector _collector;
+ @Override
public void prepare(Map conf, TopologyContext context, OutputCollector
collector) {
_collector = collector;
}
+ @Override
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
+ @Override
public void cleanup() {
}
+ @Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
- public Map getComponentConfiguration() {
+ @Override
+ public Map<String, Object> getComponentConfiguration() {
return null;
}
}
@@ -164,9 +169,9 @@ public static class ExclamationBolt impl
The `prepare` method provides the bolt with an `OutputCollector` that is used
for emitting tuples from this bolt. Tuples can be emitted at anytime from the
bolt -- in the `prepare`, `execute`, or `cleanup` methods, or even
asynchronously in another thread. This `prepare` implementation simply saves
the `OutputCollector` as an instance variable to be used later on in the
`execute` method.
-The `execute` method receives a tuple from one of the bolt's inputs. The
`ExclamationBolt` grabs the first field from the tuple and emits a new tuple
with the string "!!!" appended to it. If you implement a bolt that subscribes
to multiple input sources, you can find out which component the
[Tuple](javadocs/backtype/storm/tuple/Tuple.html) came from by using the
`Tuple#getSourceComponent` method.
+The `execute` method receives a tuple from one of the bolt's inputs. The
`ExclamationBolt` grabs the first field from the tuple and emits a new tuple
with the string "!!!" appended to it. If you implement a bolt that subscribes
to multiple input sources, you can find out which component the
[Tuple](/javadoc/apidocs/backtype/storm/tuple/Tuple.html) came from by using
the `Tuple#getSourceComponent` method.
-There's a few other things going in in the `execute` method, namely that the
input tuple is passed as the first argument to `emit` and the input tuple is
acked on the final line. These are part of Storm's reliability API for
guaranteeing no data loss and will be explained later in this tutorial.
+There's a few other things going on in the `execute` method, namely that the
input tuple is passed as the first argument to `emit` and the input tuple is
acked on the final line. These are part of Storm's reliability API for
guaranteeing no data loss and will be explained later in this tutorial.
The `cleanup` method is called when a Bolt is being shutdown and should
cleanup any resources that were opened. There's no guarantee that this method
will be called on the cluster: for example, if the machine the task is running
on blows up, there's no way to invoke the method. The `cleanup` method is
intended for when you run topologies in [local mode](Local-mode.html) (where a
Storm cluster is simulated in process), and you want to be able to run and kill
many topologies without suffering any resource leaks.
@@ -180,15 +185,18 @@ Methods like `cleanup` and `getComponent
public static class ExclamationBolt extends BaseRichBolt {
OutputCollector _collector;
+ @Override
public void prepare(Map conf, TopologyContext context, OutputCollector
collector) {
_collector = collector;
}
+ @Override
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
+ @Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md
(original)
+++
storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md
Thu Mar 17 22:05:29 2016
@@ -38,7 +38,7 @@ The following sections give an overview
### Number of executors (threads)
* Description: How many executors to spawn _per component_.
-* Configuration option: ?
+* Configuration option: None (pass ``parallelism_hint`` parameter to
``setSpout`` or ``setBolt``)
* How to set in your code (examples):
*
[TopologyBuilder#setSpout()](javadocs/backtype/storm/topology/TopologyBuilder.html)
*
[TopologyBuilder#setBolt()](javadocs/backtype/storm/topology/TopologyBuilder.html)
@@ -57,7 +57,7 @@ Here is an example code snippet to show
```java
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
- .shuffleGrouping("blue-spout);
+ .shuffleGrouping("blue-spout");
```
In the above code we configured Storm to run the bolt ``GreenBolt`` with an
initial number of two executors and four associated tasks. Storm will run two
tasks per executor (thread). If you do not explicitly configure the number of
tasks, Storm will run by default one task per executor.
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md
(original)
+++
storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md
Thu Mar 17 22:05:29 2016
@@ -1,4 +1,5 @@
---
+title: Using non JVM languages with Storm
layout: documentation
version: v0.10.0
---
Modified:
storm/branches/bobby-versioned-site/releases/0.10.0/images/topology.png
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/images/topology.png?rev=1735510&r1=1735509&r2=1735510&view=diff
==============================================================================
Binary files - no diff available.