Author: bobby
Date: Thu Mar 17 22:48:32 2016
New Revision: 1735516
URL: http://svn.apache.org/viewvc?rev=1735516&view=rev
Log:
Pulled in some more files from asf_site in git
Added:
storm/branches/bobby-versioned-site/Powered-By.md
storm/branches/bobby-versioned-site/releases/0.10.0/flux.md
storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md
storm/branches/bobby-versioned-site/releases/0.10.0/storm-hbase.md
storm/branches/bobby-versioned-site/releases/0.10.0/storm-hdfs.md
storm/branches/bobby-versioned-site/releases/0.10.0/storm-hive.md
storm/branches/bobby-versioned-site/releases/0.10.0/storm-jdbc.md
storm/branches/bobby-versioned-site/releases/0.10.0/storm-kafka.md
storm/branches/bobby-versioned-site/releases/0.10.0/storm-redis.md
Modified:
storm/branches/bobby-versioned-site/getting-help.md
storm/branches/bobby-versioned-site/index.html
Added: storm/branches/bobby-versioned-site/Powered-By.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/Powered-By.md?rev=1735516&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/Powered-By.md (added)
+++ storm/branches/bobby-versioned-site/Powered-By.md Thu Mar 17 22:48:32 2016
@@ -0,0 +1,1040 @@
+---
+title: Companies Using Apache Storm
+layout: documentation
+documentation: true
+---
+Want to be added to this page? Send an email
[here](mailto:[email protected]).
+
+<table class="table table-striped">
+
+<tr>
+<td>
+<a href="http://groupon.com">Groupon</a>
+</td>
+<td>
+<p>
+At Groupon we use Storm to build real-time data integration systems. Storm
helps us analyze, clean, normalize, and resolve large amounts of non-unique
data points with low latency and high throughput.
+</p>
+</td>
+</tr>
+
+<tr>
+<td><a href="http://www.weather.com/">The Weather Channel</a></td>
+<td>
+<p>At Weather Channel we use several Storm topologies to ingest and persist
weather data. Each topology is responsible for fetching one dataset from an
internal or external network (the Internet), reshaping the records for use by
our company, and persisting the records to relational databases. It is
particularly useful to have an automatic mechanism for repeating attempts to
download and manipulate the data when there is a hiccup.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.fullcontact.com/">FullContact</a>
+</td>
+<td>
+<p>
+At FullContact we currently use Storm as the backbone of the system which
synchronizes our Cloud Address Book with third party services such as Google
Contacts and Salesforce. We also use it to provide real-time support for our
contact graph analysis and federated contact search systems.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://twitter.com">Twitter</a>
+</td>
+<td>
+<p>
+Storm powers a wide variety of Twitter systems, ranging in applications from
discovery, realtime analytics, personalization, search, revenue optimization,
and many more. Storm integrates with the rest of Twitter's infrastructure,
including database systems (Cassandra, Memcached, etc), the messaging
infrastructure, Mesos, and the monitoring/alerting systems. Storm's isolation
scheduler makes it easy to use the same cluster both for production
applications and in-development applications, and it provides a sane way to do
capacity planning.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.com">Yahoo!</a>
+</td>
+<td>
+<p>
+Yahoo! is developing a next generation platform that enables the convergence
of big-data and low-latency processing. While Hadoop is our primary technology
for batch processing, Storm empowers stream/micro-batch processing of user
events, content feeds, and application logs.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.co.jp/">Yahoo! JAPAN</a>
+</td>
+<td>
+<p>
+Yahoo! JAPAN is a leading web portal in Japan. Storm applications are
processing various streaming data such as logs or social data. We use Storm to
feed contents, monitor systems, detect trending topics, and crawl on websites.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.webmd.com">WebMD</a>
+</td>
+<td>
+<p>
+We use Storm to power our Medscape Medpulse mobile application which allow
medical professionals to follow important medical trends with Medscape's
curated Today on Twitter feed and selection of blogs. Storm topology is
capturing and processing tweets with twitter streaming API, enhance tweets with
metadata and images, do real time NLP and execute several business rules. Storm
also monitors selection of blogs in order to give our customers real-time
updates. We also use Storm for internal data pipelines to do ETL and for our
internal marketing platform where time and freshness are essential.
+</p>
+<p>
+We use storm to power our search indexing process. We continue to discover
new use cases for storm and it became one of the core component in our
technology stack.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.spotify.com">Spotify</a>
+</td>
+<td>
+<p>
+Spotify serves streaming music to over 10 million subscribers and 40 million
active users. Storm powers a wide range of real-time features at Spotify,
including music recommendation, monitoring, analytics, and ads targeting.
Together with Kafka, memcached, Cassandra, and netty-zmtp based messaging,
Storm enables us to build low-latency fault-tolerant distributed systems with
ease.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.infochimps.com">Infochimps</a>
+</td>
+<td>
+<p>
+Infochimps uses Storm as part of its Big Data Enterprise Cloud. Specifically,
it uses Storm as the basis for one of three of its cloud data services -
namely, Data Delivery Services (DDS), which uses Storm to provide a
fault-tolerant and linearly scalable enterprise data collection, transport, and
complex in-stream processing cloud service.
+</p>
+
+<p>
+In much the same way that Hadoop provides batch ETL and large-scale batch
analytical processing, the Data Delivery Service provides real-time ETL and
large-scale real-time analytical processing â the perfect complement to
Hadoop (or in some cases, what you needed instead of Hadoop).
+</p>
+
+<p>
+DDS uses both Storm and Kafka along with a host of additional technologies to
provide an enterprise-class real-time stream processing solution with features
including:
+</p>
+
+<ul>
+<li>
+Integration connections to any variety of data sources in a way that is robust
yet as non-invasive
+</li>
+<li>
+Optimizations for highly scalable, reliable data import and distributed ETL
(extract, transform, load), fulfilling data transport needs
+</li>
+<li>
+Developer tools for rapid development of decorators, which perform the
real-time stream processing
+</li>
+<li>
+Guaranteed delivery framework and data failover snapshots to send processed
data to analytics systems, databases, file systems, and applications with
extreme reliability
+</li>
+<li>
+Rapid solution development and deployment, along with our expert Big Data
methodology and best practices
+</li>
+</ul>
+
+<p>Infochimps has extensive experience in deploying its DDS to power
large-scale clickstream web data flows, massive Twitter stream processes,
Foursquare event processing, customer purchase data, product pricing data, and
more.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://healthmarketscience.com/">Health Market Science</a>
+</td>
+<td>
+<p>
+Health Market Science (HMS) provides data management as a service for the
healthcare industry. Storm is at the core of the HMS big data platform
functioning as the data ingestion mechanism, which orchestrates the data flow
across multiple persistence mechanisms that allow HMS to deliver Master Data
Management (MDM) and analytics capabilities for wide range of healthcare needs:
compliance, integrity, data quality, and operational decision support.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://www.verisigninc.com/">Verisign</a>
+</td>
+<td>
+<p>
+Verisign, a global leader in domain names and Internet security, enables
Internet navigation for many of the world's most recognized domain names and
provides protection for enterprises around the world. Ensuring the security,
stability, and resiliency of key Internet infrastructure and services,
including the .COM and .NET top level domains and two of the Internet's DNS
root servers, is at the heart of Verisignâs mission. Storm is a component of
our data analytics stack that powers a variety of real-time applications. One
example is security monitoring where we are leveraging Storm to analyze the
network telemetry data of our globally distributed infrastructure in order to
detect and mitigate cyber attacks.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://cerner.com/">Cerner</a>
+</td>
+<td>
+<p>
+Cerner is a leader in health care information technology. We have been using
Storm since its release to process massive amounts of clinical data in
real-time. Storm integrates well in our architecture, allowing us to quickly
provide clinicians with the data they need to make medical decisions.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.aeris.com/">Aeris Communications</a>
+</td>
+<td>
+<p>
+Aeris Communications has the only cellular network that was designed and built
exclusively for machines. Our ability to provide scalable, reliable real-time
analytics - powered by Storm - for machine to machine (M2M) communication
offers immense value to our customers. We are using Storm in production since
Q1 of 2013.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://flipboard.com/">Flipboard</a>
+</td>
+<td>
+<p>
+Flipboard is the worldʼs ï¬rst social magazine, a single place to keep up
with everything you care about and collect it in ways that let reï¬ect you.
Inspired by the beauty and ease of print media, Flipboard is designed so you
can easily ï¬ip through news from around the world or stories from right at
home, helping people ï¬nd the one thing that can inform, entertain or even
inspire them every day.
+</p>
+<p>
+We are using Storm across a wide range of our services from content search, to
realtime analytics, to generating custom magazine feeds. We then integrate
Storm across our infrastructure within systems like ElasticSearch, HBase,
Hadoop and HDFS to create a highly scalable data platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.rubiconproject.com/">Rubicon Project</a>
+</td>
+<td>
+<p>
+Storm is being used in production mode at the Rubicon Project to analyze the
results of auctions of ad impressions on its RTB exchange as they occur. It is
currently processing around 650 million auction results in three data centers
daily (with 3 separate Storm clusters). One simple application is identifying
new creatives (ads) in real time for ad quality purposes. A more sophisticated
application is an "Inventory Valuation Service" that uses DRPC to return
appraisals of new impressions before the auction takes place. The appraisals
are used for various optimization problems, such as deciding whether to auction
an impression or skip it when close to maximum capacity.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ooyala.com/">Ooyala</a>
+</td>
+<td>
+<p>
+Ooyala powers personalized multi-screen video experiences for some of the
world's largest networks, brands and media companies. We provide all the
technology and tools our customers need to manage, distribute and monetize
digital video content at a global scale.
+</p>
+
+<p>
+At the core of our technology is an analytics engine that processes over two
billion analytics events each day, derived from nearly 200 million viewers
worldwide who watch video on an Ooyala-powered player.
+</p>
+
+<p>
+Ooyala will be deploying Storm in production to give our customers real-time
streaming analytics on consumer viewing behavior and digital content trends.
Storm enables us to rapidly mine one of the world's largest online video data
sets to deliver up-to-the-minute business intelligence ranging from real-time
viewing patterns to personalized content recommendations to dynamic programming
guides and dozens of other insights for maximizing revenue with online video.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.taobao.com/index_global.php">Taobao</a>
+</td>
+<td>
+<p>
+We make statistics of logs and extract useful information from the statistics
in almost real-time with Storm. Logs are read from Kafka-like persistent
message queues into spouts, then processed and emitted over the topologies to
compute desired results, which are then stored into distributed databases to be
used elsewhere. Input log count varies from 2 millions to 1.5 billion every
day, whose size is up to 2 terabytes among the projects. The main challenge
here is not only real-time processing of big data set; storing and persisting
result is also a challenge and needs careful design and implementation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.alibaba.com/">Alibaba</a>
+</td>
+<td>
+<p>
+Alibaba is the leading B2B e-commerce website in the world. We use storm to
process the application log and the data change in database to supply realtime
stats for data apps.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://iQIYI.COM">iQIYI</a>
+</td>
+<td>
+<p>
+iQIYI is China`s largest online video platform. We are using Storm in our
video advertising system, video recommendation system, log analysis system and
many other scenarios. Now we have several standalone Storm clusters, and we
also have Storm clusters on Mesos and on Yarn. Kafka-Storm integration and
StormâHBase integration are quite common in our production environment. We
have great interests in the new development about integration of Storm with
other applications, like HBase, HDFS and Kafka.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.baidu.com/">Baidu</a>
+</td>
+<td>
+<p>
+Baidu offers top searching technology services for websites, audio files and
images, my group using Storm to process the searching logs to supply realtime
stats for accounting pv, ar-time and so on.
+This project helps Ops to determine and monitor services status and can do
great things in the future.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yelp.com/">Yelp</a>
+</td>
+<td>
+<p>
+Yelp is using Storm with <a href="http://pyleus.org/">Pyleus</a> to build a
platform for developers to consume and process high throughput streams of data
in real time. We have ongoing projects to use Storm and Pyleus for overhauling
our internal application metrics pipeline, building an automated Python profile
analysis system, and for general ETL operations. As its support for non-JVM
components matures, we hope to make Storm the standard way of processing
streaming data at Yelp.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.klout.com/">Klout</a>
+</td>
+<td>
+<p>
+Klout helps everyone discover and be recognized for their influence by
analyzing engagement with their content across social networks. Our analysis
powers a daily Klout Score on a scale from 1-100 that shows how much influence
social media users have and on what topics. We are using Storm to develop a
realtime scoring and moments generation pipeline. Leveraging Storm's intuitive
Trident abstraction we are able to create complex topologies which stream data
from our network collectors via Kafka, processed and written out to HDFS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.loggly.com">Loggly</a>
+</td>
+<td>
+<p>
+Loggly is the world's most popular cloud-based log management. Our cloud-based
log management service helps DevOps and technical teams make sense of the the
massive quantity of logs that are being produced by a growing number of
cloud-centric applications â in order to solve operational problems faster.
Storm is the heart of our ingestion pipeline where it filters, parses and
analyses billions of log events all-day, every day and in real-time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://premise.is/">premise.is</a>
+</td>
+<td>
+<p>
+We're building a platform for alternative, bottom-up, high-granularity
econometric data capture, particularly targeting opaque developing economies
(i.e., Argentina might lie about their inflation statistics, but their black
market certainly doesn't). Basically we get to funnel hedge fund money into
improving global economic transparency.
+</p>
+<p>
+We've been using Storm in production since January 2012 as a streaming,
time-indexed web crawl + extraction + machine learning-based semantic markup
flow (about 60 physical nodes comparable to m1.large; generating a modest
25GB/hr incremental). We wanted to have an end-to-end push-based system where
new inputs get percolated through the topology in realtime and appear on the
website, with no batch jobs required in between steps. Storm has been really
integral to realizing this goal.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://www.wego.com/">Wego</a>
+</td>
+<td>
+<p>About Wego, we are one of the worldâs most comprehensive travel
metasearch engines, operating in 42 markets worldwide and used by millions of
travelers to save time, pay less and travel more. We compare and display
real-time flights, hotel pricing and availability from hundreds of leading
travel sites from all around the world on one simple screen.</p>
+
+<p>At the heart of our products, Storm helps us to stream real-time
meta-search data from our partners to end-users. Since data comes from many
sources and with different timing, Storm topology concept naturally solves
concurrency issues while helping us to continuously merge, slice and clean all
the data. Additionally with a few tricks and tools provided in Storm we can
easily apply incremental update to improve the flow our data (1-5GB/minute).</p>
+
+<p>With its simplicity, scalability, and flexibility, Storm does not only
improve our current products but more importantly changes the way we work with
data. Instead of keeping data static and crunching it once a while, we
constantly move data all around, making use of different technologies,
evaluating new ideas and building new products. We stream critical data to
memory for fast access while continuously crunching and directing huge amount
of data into various engines so that we can evaluate and make use of data
instantly. Previously, this kind of system requires to setup and maintain quite
a few things but with Storm all we need is half day of coding and a few seconds
to deploy. In this sense we never think Storm is to serve our products but
rather to evolve our products.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://rocketfuel.com/">RocketFuel</a>
+</td>
+<td>
+<p>
+At Rocket Fuel (an ad network) we are building a real time platform on top of
Storm which imitates the time critical workflows of existing Hadoop based ETL
pipeline. This platform tracks impressions, clicks, conversions, bid requests
etc. in real time. We are using Kafka as message queue. To start with we are
pushing per minute aggregations directly to MySQL, but we plan to go finer than
one minute and may bring HBase in to the picture to handle increased write
load.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://quicklizard.com/">QuickLizard</a>
+</td>
+<td>
+<p>
+QuickLizard builds solution for automated pricing for companies that have many
products in their lists. Prices are influenced by multiple factors internal and
external to company.
+</p>
+
+<p>
+Currently we use Storm to choose products that need to be priced. We get real
time stream of events from client site and filters them to get much more light
stream of products that need to be processed by our procedures to get price
recommendation.
+</p>
+
+<p>
+In plans: use Storm also for real time data mining model calculation that
should match products described on competitor sites to client products.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://spider.io/">spider.io</a>
+</td>
+<td>
+<p>
+At spider.io we've been using Storm as a core part of our classification
engine since October 2011. We run Storm topologies to combine, analyse and
classify real-time streams of internet traffic, to identify suspicious or
undesirable website activity. Over the past 7 months we've expanded our use of
Storm, so it now manages most of our real-time processing. Our classifications
are displayed in a custom analytics dashboard, where Storm's distributed remote
procedure call interface is used to gather data from our database and metadata
services. DRPC allows us to increase the responsiveness of our user interface
by distributing processing across a cluster of Amazon EC2 instances.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://8digits.com/">8digits</a>
+</td>
+<td>
+<p>
+At 8digits, we are using Storm in our analytics engine, which is one of the
most crucial parts of our infrastructure. We are utilizing several cloud
servers with multiple cores each for the purpose of running a real-time system
making several complex calculations. Storm is a proven, solid and a powerful
framework for most of the big-data problems.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="https://www.alipay.com/">Alipay</a>
+</td>
+<td>
+<p>
+Alipay is China's leading third-party online payment platform. We are using
Storm in many scenarios:
+</p>
+
+<ol>
+<li>
+Calculate realtime trade quantity, trade amount, the TOP N seller trading
information, user register count. More than 100 million messages per day.
+</li>
+<li>
+Log processing, more than 6T data per day.
+</li>
+</ol>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://navisite.com/">NaviSite</a>
+</td>
+<td>
+<p>
+We are using Storm as part of our server event log monitoring/auditing system.
We send log messages from thousands of servers into a RabbitMQ cluster and
then use Storm to check each message against a set of regular expressions. If
there is a match (< 1% of messages), then the message is sent to a bolt that
stores data in a Mongo database. Right now we are handling a load of somewhere
around 5-10k messages per second, however we tested our existing RabbitMQ +
Storm clusters up to about 50k per second. We have plans to do real time
intrusion detection as an enhancement to the current log message reporting
system.
+</p>
+
+<p>
+We have Storm deployed on the NaviSite Cloud platform. We have a ZK cluster
of 3 small VMs, 1 Nimbus VM and 16 dual core/4GB VMs as supervisors.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.paywithglyph.com">Glyph</a>
+</td>
+<td>
+<p>
+Glyph is in the business of providing credit card rewards intelligence to
consumers. At a given point of sale Glyph suggest its users what are the best
cards to be used at a given merchant location that will provide maximum
rewards. Glyph also provide suggestion on the cards the user should carry to
earn maximum rewards based on his personal spending habits. Glyph provides this
information to the user by retrieving and analyzing credit card transactions
from banks. Storm is used in Glyph to perform this retrieval and analysis in
realtime. We are using Memcached in conjuction with Storm for handling
sessions. We are impressed by how Storm makes high availability and reliability
of Glyph services possible. We are now using Storm and Clojure in building
Glyph data analytics and insights services. We have open-sourced node-drpc
wrapper module for easy Storm DRPC integration with NodeJS.
+</p>
+</td>
+</tr>
+<tr>
+<td>
+<a href="http://heartbyte.com/">Heartbyte</a>
+</td>
+<td>
+<p>
+At Heartbyte, Storm is a central piece of our realtime audience participation
platform. We are often required to process a 'vote' per second from hundreds
of thousands of mobile devices simultaneously and process / aggregate all of
the data within a second. Further, we are finding that Storm is a great
alternative to other ingest tools for Hadoop/HBase, which we use for batch
processing after our events conclude.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://2lemetry.com/">2lemetry</a>
+</td>
+<td>
+<p>
+2lemetry uses Storm to power it's real time analytics on top of the m2m.io
offering. 2lemetry is partnered with Sprint, Verizon, AT&T, and Arrow
Electronics to power IoT applications world wide. Some of 2lemetry's larger
projects include RTX, Kontron, and Intel. 2lemetry also works with many
professional sporting teams to parse data in real time. 2lemetry receives
events for every touch of the ball in every MLS soccer match. Storm is used to
look for trends like passing tendencies as they develop during the game.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.nodeable.com/">Nodeable</a>
+</td>
+<td>
+<p>
+Nodeable uses Storm to deliver real-time continuous computation of the data we
consume. Storm has made it significantly easier for us to scale our service
more efficiently while ensuring the data we deliver is timely and accurate.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://twitsprout.com/">TwitSprout</a>
+</td>
+<td>
+<p>
+At TwitSprout, we use Storm to analyze activity on Twitter to monitor mentions
of keywords (mostly client product and brand names) and trigger alerts when
activity around a certain keyword spikes above normal levels. We also use Storm
to back the data behind the live-infographics we produce for events sponsored
by our clients. The infographics are usually in the form of a live dashboard
that helps measure the audience buzz across social media as it relates to the
event in realtime.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.happyelements.com/">HappyElements</a>
+</td>
+<td>
+<p>
+<a href="http://www.happyelements.com">HappyElements</a> is a leading social
game developer on Facebook and other SNS platforms. We developed a real time
data analysis program based on storm to analyze user activity in real time.
Storm is very easy to use, stable, scalable and maintainable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.idexx.com/view/xhtml/en_us/corporate/home.jsf">IDEXX
Laboratories</a>
+</td>
+<td>
+<p>
+IDEXX Laboratories is the leading maker of software and diagnostic instruments
for the veterinary market. We collect and analyze veterinary medical data from
thousands of veterinary clinics across the US. We recently embarked on a
project to upgrade our aging data processing infrastructure that was unable to
keep up with the rapid increase in the volume, velocity and variety of data
that we were processing.
+</p>
+
+<p>
+We are utilizing the Storm system to take in the data that is extracted from
the medical records in a number of different schemas, transform it into a
standard schema that we created and store it in an Oracle RDBMS database. It is
basically a souped up distributed ETL system. Storm takes on the plumbing
necessary for a distributed system and is very easy to write code for. The
ability to create small pieces of functionality and connect them together gives
us the ultimate flexibility to parallelize each of the pieces differently.
+</p>
+
+<p>
+Our current cluster consists of four supervisor machines running 110 tasks
inside 32 worker processes. We run two different topologies which receive
messages and communicate with each other via RabbitMQ. The whole thing is
deployed on Amazon Web Services and utilizes S3 for some intermediate storage,
Redis as a key/value store and Oracle RDS for RDBMS storage. The bolts are all
written in Java using the Spring framework with Hibernate as an ORM.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.umeng.com/">Umeng</a>
+</td>
+<td>
+Umeng is the leading and largest provider of mobile app analytics and
developer services platform in China. Storm powers Umeng's realtime analytics
platform, processing billions of data points per day and growing. We also use
Storm in other products which requires realtime processing and it has become
the core infrastructure in our company.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.admaster.com.cn/">Admaster</a>
+</td>
+<td>
+<p>
+We provide monitoring and precise delivery for Internet advertising. We use
Storm to do the following:
+</p>
+
+<ol>
+<li>Calculate PV, UV of every advertisement.</li>
+<li>Simple data cleaning: filter out data which format error, filter out
cheating data (the pv more than certain value)</li>
+</ol>
+Our cluster has 8 nodes, process several billions messages per day, about
200GB.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://socialmetrix.com/en/">SocialMetrix</a>
+</td>
+<td>
+<p>
+Since its release, Storm was a perfect fit to our needs of real time
monitoring. Its powerful API, easy administration and deploy, enabled us to
rapidly build solutions to monitor presidential elections, several major events
and currently it is the processing core of our new product "Socialmetrix
Eventia".
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://needium.com/">Needium</a>
+</td>
+<td>
+<p>
+At Needium we love Ruby and JRuby. The Storm platform offers the right balance
between simplicity, flexibility and scalability. We created RedStorm, a Ruby
DSL for Storm, to keep on using Ruby on top of the power of Storm by leveraging
Storm's JVM foundation with JRuby. We currently use Storm as our Twitter
realtime data processing pipeline. We have Storm topologies for content
filtering, geolocalisation and classification. Storm allows us to architecture
our pipeline for the Twitter full firehose scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://parse.ly/">Parse.ly</a>
+</td>
+<td>
+<p>
+Parse.ly is using Storm for its web/content analytics system. We have a
home-grown data processing and storage system built with Python and Celery,
with backend stores in Redis and MongoDB. We are now using Storm for real-time
unique visitor counting and are exploring options for using it for some of our
richer data sources such as social share data and semantic content metadata.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.parc.com/">PARC</a>
+</td>
+<td>
+<p>
+High Performance Graph Analytics & Real-time Insights Research team at PARC
uses Storm as one of the building blocks of their PARC Analytics Cloud
infrastructure which comprises of Nebula based Openstack, Hadoop, SAP HANA,
Storm, PARC Graph Analytics, and machine learning toolbox to enable researchers
to process real-time data feeds from Sensors, web, network, social media, and
security traces and easily ingest any other real-time data feeds of interest
for PARC researchers.
+</p>
+<p>
+PARC researchers are working with number of industry collaborators developing
new tools, algorithms, and models to analyze massive amounts of e-commerce, web
clickstreams, 3rd party syndicated data, cohort data, social media data
streams, and structured data from RDBMS, NOSQL, and NEWSQL systems in near
real-time. PARC team is developing a reference architecture and benchmarks for
their near real-time automated insight discovery platform combining the power
of all above tools and PARCâs applied research in machine learning, graph
analytics, reasoning, clustering, and contextual recommendations. The High
Performance Graph Analytics & Real-time Insights research at PARC is headed by
Surendra Reddy<http://www.linkedin.com/in/skreddy>. If you are interested to
learn more about our use/experience of using Storm or to know more about our
research or to collaborate with PARC in this area, please feel free to contact
[email protected].
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://gumgum.com/">GumGum</a>
+</td>
+<td>
+<p>
+GumGum, the leading in-image advertising platform for publishers and brands,
uses Storm to produce real-time data. Storm and Trident-based topologies
consume various ad-related events from Kafka and persist the aggregations in
MySQL and HBase. This architecture will eventually replace most existing daily
Hadoop map reduce jobs. There are also plans for Kafka + Storm to replace
existing distributed queue processing infrastructure built with Amazon SQS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.crowdflower.com/">CrowdFlower</a>
+</td>
+<td>
+<p>
+CrowdFlower is using Storm with Kafka to generalize our data stream
+aggregation and realtime computation infrastructure. We replaced our
+homegrown aggregation solutions with Storm because it simplified the
+creation of fault tolerant systems. We were already using Zookeeper
+and Kafka, so Storm allowed us to build more generic abstractions for
+our analytics using tools that we had already deployed and
+battle-tested in production.
+</p>
+
+<p>
+We are currently writing to DynamoDB from Storm, so we are able to
+scale our capacity quickly by bringing up additional supervisors and
+tweaking the throughput on our Dynamo tables. We look forward to
+exploring other uses for Storm in our system, especially with the
+recent release of Trident.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.dsbox.com">Digital Sandbox</a>
+</td>
+<td>
+<p>
+At Digital Sandbox we use Storm to enable our open source information feed
monitoring system. The system uses Storm to constantly monitor and pull data
from structured and unstructured information sources across the internet. For
each found item, our topology applies natural language processing based concept
analysis, temporal analysis, geospatial analytics and a prioritization
algorithm to enable users to monitor large special events, public safety
operations, and topics of interest to a multitude of individual users and teams.
+</p>
+
+<p>
+Our system is built using Storm for feed retrieval and annotation, Python with
Flask and jQuery for business logic and web interfaces, and MongoDB for data
persistence. We use NTLK for natural language processing and the WordNet,
GeoNames, and OpenStreetMap databases to enable feed item concept extraction
and geolocation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://hallo.me/">Hallo</a>
+</td>
+<td>
+With several mainstream celebrities and very popular YouTubers using Hallo to
communicate with their fans, we needed a good solution to notify users via push
notifications and make sure that the celebrity messages were delivered to
follower timelines in near realtime. Our initial approach for broadcast push
notifications would take anywhere from 2-3 hours. After re-engineering our
solution on top of Storm, that time has been cut down to 5 minutes on a very
small cluster. With the user base growing and user need for realtime
communication, we are very happy knowing that we can easily scale Storm by
adding nodes to maintain a baseline QoS for our users.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://keepcon.com/">Keepcon</a>
+</td>
+<td>
+We provide moderation services for classifieds, kids communities, newspapers,
chat rooms, facebook fan pages, youtube channels, reviews, and all kind of UGC.
We use storm for the integration with our clients, find evidences within each
text, persisting on cassandra and elastic search and sending results back to
our clients.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.visiblemeasures.com/">Visible Measures</a>
+</td>
+<td>
+<p>
+Visible Measures powers video campaigns and analytics for publishers and
+advertisers, tracking data for hundreds of million of videos, and billions
+of views. We are using Storm to process viewing behavior data in real time and
make
+the information immediately available to our customers. We read events from
+various push and pull sources, including a Kestrel queue, filter and
+enrich the events in Storm topologies, and persist the events to Redis,
+HDFS and Vertica for real-time analytics and archiving. We are currently
+experimenting with Trident topologies, and figuring out how to move more
+of our Hadoop-based batch processing into Storm.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.o2mc.eu/en/">O2mc</a>
+</td>
+<td>
+<p>
+One of the core products of O2mc is called O2mc Community. O2mc Community
performs multilingual, realtime sentiment analysis with very low latency and
distributes the analyzed results to numerous clients. The input is extracted
from source systems like Twitter, Facebook, e-mail and many more. After the
analysis has taken place on Storm, the results are streamed to any output
system ranging from HTTP streaming to clients to direct database insertion to
an external business process engine to kickstart a process.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.theladders.com">The Ladders</a>
+</td>
+<td>
+<p>
+TheLadders has been committed to finding the right person for the right job
since 2003. We're using Storm in a variety of ways and are happy with its
versatility, robustness, and ease of development. We use Storm in conjunction
with RabbitMQ for such things as sending hiring alerts: when a recruiter
submits a job to our site, Storm processes that event and will aggregate
jobseekers whose profiles match the position. That list is subsequently
batch-processed to send an email to the list of jobseekers. We also use Storm
to persist events for Business Intelligence and internal event tracking. We're
continuing to find uses for Storm where fast, asynchronous, real-time event
processing is a must.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://semlab.nl">SemLab</a>
+</td>
+<td>
+<p>
+SemLab develops software for knowledge discovery and information support. Our
ViewerPro platform uses information extraction, natural language processing and
semantic web technologies to extract structured data from unstructured sources,
in domains such as financial news feeds and legal documents. We have
succesfully adapted ViewerPro's processing framework to run on top of Storm.
The transition to Storm has made ViewerPro a much more scalable product,
allowing us to process more in less time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://visualrevenue.com/">Visual Revenue</a>
+</td>
+<td>
+<p>
+Here at Visual Revenue, we built a decision support system to help online
editors to make choices on what, when, and where to promote their content in
real-time. Storm is the backbone our real-time data processing and aggregation
pipelines.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.peerindex.com/">PeerIndex</a>
+</td>
+<td>
+<p>
+PeerIndex is working to deliver influence at scale. PeerIndex does this by
exposing services built on top of our Influence Graph; a directed graph of who
is influencing whom on the web. PeerIndex gathers data from a number of social
networks to create the Influence Graph. We use Storm to process our social
data, to provide real-time aggregations, and to crawl the web, before storing
our data in a manner most suitable for our Hadoop based systems to batch
process. Storm provided us with an intuitive API and has slotted in well with
the rest of our architecture. PeerIndex looks forward to further investing
resources into our Storm based real-time analytics.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://ants.vn">ANTS.VN</a>
+</td>
+<td>
+<p>
+Big Data in Advertising is Vietnam's unique platform combines ad serving, a
real-time bidding (RTB) exchange, Ad Server, Analytics, yield optimization, and
content valuation to deliver the highest revenue across every desktop, tablet,
and mobile screen. At ANTS.VN we use Storm to process large amounts of data to
provide data real time, improve our Ad quality. This platform tracks
impressions, clicks, conversions, bid requests etc. in real time. Together with
Kafka, Redis, memcached and Cassandra based messaging, Storm enables us to
build low-latency fault-tolerant distributed systems with ease.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wayfair.com">Wayfair</a>
+</td>
+<td>
+<p>
+At Wayfair, we use storm as a platform to drive our core order processing
pipeline as an event driven system. Storm allows us to reliably process tens of
thousands of orders daily while providing us the assurance of seamless process
scalability as our order load increases. Given the projectâs ease of use and
the immense support of the community, weâve managed to implement our bolts in
php, construct a simple puppet module for configuration management, and quickly
solve arising issues. We can now focus most of our development efforts in the
business layer, check out more information on how we use storm <a
href="http://engineering.wayfair.com/stormin-oms/">in our engineering blog</a>.
</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://innoquant.com/">InnoQuant</a>
+</td>
+<td>
+<p>
+At InnoQuant, we use Storm as a backbone of our real-time big data analytics
engine in MOCA platform. MOCA is a next generation, mobile-backend-as-a-service
platform (MBaaS). It provides brands and app developers with real-time in-app
tracking, context-aware push messaging, user micro-segmentation based on
profile, time and geo-context as well as big data analytics. Storm-based
pipeline is fed with events captured by native mobile SDKs (iOS, Android),
scales nicely with connected mobile app users, delivers stream-based metrics
and aggregations, and finally integrates with the rest of MOCA infrastructure,
including columnar storage (Cassandra) and graph storage (Titan).
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.fliptop.com/">Fliptop</a>
+</td>
+<td>
+<p>
+Fliptop is a customer intelligence platform which allows customers to
integrating their contacts, and campaign data, to enhance their prospect with
social identities, and to find their best leads, and most influential
customers. We have been using Storm for various tasks which requires
scalability and reliability, including integrating with sales/marketing
platform, data appending for contacts/leads, and computing scoring of
contacts/leads. It's one of our most robust and scalable infrastructure.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.trovit.com/">Trovit</a>
+</td>
+<td>
+<p>
+Trovit is a search engine for classified ads present in 39 countries and
different business categories (Real Estate, Cars, Jobs, Rentals, Products and
Deals). Currently we use Storm to process and index ads in a distributed and
low latency fashion. Combined with other technologies like Hadoop, Hbase and
Solr has allowed us to build a scalable and low latency platform to serve
search results to the end user.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.openx.com/">OpenX</a>
+</td>
+<td>
+<p>
+OpenX is a unique platform combines ad serving, a real-time bidding (RTB)
exchange, yield optimization, and content valuation to deliver the highest
revenue across every desktop, tablet, and mobile screen
+At OpenX we use Storm to process large amounts of data to provide real time
Analytics. Storm provides us to process data real time to improve our Ad
quality.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://keen.io/">Keen IO</a>
+</td>
+<td>
+<p>
+Keen IO is an analytics backend-as-a-service. The Keen IO API makes it easy
for customers to do internal analytics or expose analytics features to their
customers. Keen IO uses Storm (DRPC) to query billion-event data sets at very
low latencies. We also use Storm to control our ingestion pipeline, sourcing
data from Kafka and storing it in Cassandra.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://liveperson.com/">LivePerson</a>
+</td>
+<td>
+<p>
+LivePerson is a provider of Interaction-Service over the web. Interaction
between an agent and a visitor in site can be achieved using phone call, chat,
banners, etc.Using Storm, LivePerson can collect and process visitor data and
provide information in real time to the agents about the visitor behavior.
Moreover, LivePerson gets to better decisions about how to react to visitors in
a way that best addresses their needs.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://yieldbot.com/">YieldBot</a>
+</td>
+<td>
+<p>
+Yieldbot connects ads to the real-time consumer intent streaming within
premium publishers. To do this, Yieldbot leverages Storm for a wide variety of
real-time processing tasks. We've open sourced our clojure DSL for writing
trident topologies, marceline, which we use extensively. Events are read from
Kafka, most state is stored in Cassandra, and we heavily use Storm's DRPC
features. Our Storm use cases range from HTML processing, to hotness-style
trending, to probabilistic rankings and cardinalities. Storm topologies touch
virtually all of the events generated by the Yieldbot platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://equinix.com/">Equinix</a>
+</td>
+<td>
+<p>
+At Equinix, we use a number of Storm topologies to process and persist various
data streams generated by sensors in our data centers. We also use Storm for
real-time monitoring of different infrastructure components. Other few
topologies are used for processing logs in real-time for internal IT systems
which also provide insights in user behavior.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://minewhat.com/">MineWhat</a>
+</td>
+<td>
+<p>
+MineWhat provides actionable analytics for ecommerce spanning every SKU,brand
and category in the store. We use Storm to process raw click stream ingestion
from Kafka and compute live analytics. Storm topologies powers our complex
product to user interaction analysis. Multi language feature in storm is really
kick-ass, we have bolts written in Node.js, Python and Ruby. Storm has been in
our production site since Nov 2012.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.360.cn/">Qihoo 360</a>
+</td>
+<td>
+<p>
+360 have deployed about 50 realtime applications on top of storm including web
page analysis, log processing, image processing, voice processing, etc.
+</p>
+<p>
+The use case of storm at 360 is a bit special since we deployed storm on
thounds of servers which are not dedicated for storm. Storm just use little
cpu/memory/network resource on each server. However theses storm clusters
leverage idle resources of servers at nearly zero cost to provide great
computing power and it's realtime. It's amazing.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.holidaycheck.com/">HolidayCheck</a>
+</td>
+<td>
+<p>
+HolidayCheck is an online travel site and agency available in 10
+languages worldwide visited by 30 million people a month.
+We use Storm to deliver real-time hotel and holiday package offers
+from multiple providers - reservation systems and affiliate travel
+networks - in a low latency fashion based on user-selected criteria.
+In further reservation steps we use DRPC for vacancy checks and
+bookings of chosen offers. Along with Storm in the system for offers
+delivery we use Scala, Akka, Hazelcast, Drools and MongoDB. Real-time
+offer stream is delivered outside of the system back to the front-end
+via websocket connections.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://dataminelab.com/">DataMine Lab</a>
+</td>
+<td>
+<p>
+DataMine Lab is a consulting company integrating Storm into its
+portfolio of technologies. Storm powers range of our customers'
+systems allowing us to build real time analytics on tens of millions
+of visitors to the advertising platforms we helped to create. Together
+with Redis, Cassandra and Hadoop, Storm allows us to provide real-time
+distributed data platform at a global scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wizecommerce.com/">Wize Commerce</a>
+</td>
+<td>
+<p>
+Wize Commerce® is the smartest way to grow your digital business. For over
ten years, we have been helping clients maximize their revenue and traffic
using optimization technologies that operate at massive scale, and across
digital ecosystems. We own and operate leading comparison shopping engines
including Nextag®, PriceMachineTM, and <a
href="http://guenstiger.de">guenstiger.de</a>, and provide services to a wide
ecosystem of partner sites that use our e-commerce platform. These sites
together drive over $1B in annual merchant sales.
+</p>
+<p>
+We use storm to power our core platform infrastructure and it has become a
vital component of our search indexing system & Cassandra storage. Along with
KAFKA, STORM has reduced our end-to-end latencies from several hours to few
minutes, and being largest comparison shopping sites operator, pushing price
updates to the live site is very important and storm helps a lot achieve the
same. We are extensively using storm in production since Q1 2013.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://metamarkets.com">Metamarkets</a>
+</td>
+<td>
+<p>At Metamarkets, Apache Storm is used to process real-time event data
streamed from Apache Kafka message brokers, and then to load that data into a
<a href="http://druid.io">Druid cluster</a>, the low-latency data store at the
heart of our real-time analytics service. Our Storm topologies perform various
operations, ranging from simple filtering of "outdated" events, to
transformations such as ID-to-name lookups, to complex multi-stream joins.
Since our service is intended to respond to ad-hoc queries within seconds of
ingesting events, the speed, flexibility, and robustness of those topologies
make Storm a key piece of our real-time stack.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mightytravels.com">Mighty Travels</a>
+</td>
+<td>
+<p>We are using Storm to process real-time search data stream and
+application logs. The part we like best about Storm is the ease of
+scaling up basically just by throwing more machines at it.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.polecat.co">Polecat</a>
+</td>
+<td>
+<p>Polecat's digital analyisis platform, MeaningMine, allows users to search
all on-line news, blogs and social media in real-time and run bespoke analysis
in order to inform corporate strategy and decision making for some of the world
largest companies and governmental organisations.</p>
+<p>
+Polecat uses Storm to run an application we've called the 'Data Munger'. We
run many different topologies on a multi host storm cluster to process tens of
millions of online articles and posts that we collect each day. Storm handles
our analysis of these documents so that we can provide insight on realtime data
to our clients. We output our results from Storm into one of many large Apache
Solr clusters for our end user applications to query (Polecat is also a
contributor to Solr). We first starting developing our app to run on storm
back in June 2012 and it has been live since roughly September 2012. We've
found Storm to be an excellent fit for our needs here, and we've always found
it extremely robust and fast.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://www.skylight.io/">Skylight by Tilde</a>
+</td>
+<td>
+<p>Skylight is a production profiler for Ruby on Rails apps that focuses on
providing detailed information about your running application that you can
explore in an intuitive way. We use Storm to process traces from our agent into
data structures that we can slice and dice for you in our web app.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ad4game.com/">Ad4Game</a>
+</td>
+<td>
+<p>We are an advertising network and we use Storm to calculate priorities in
real time to know which ads to show for which website, visitor and country.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.impetus.com/">Impetus Technologies</a>
+</td>
+<td>
+<p>StreamAnalytix, a product of Impetus Technologies enables enterprises to
analyze and respond to events in real-time at Big Data scale. Based on Apache
Storm, StreamAnalytix is designed to rapidly build and deploy streaming
analytics applications for any industry vertical, any data format, and any use
case. This high-performance scalable platform comes with a pre-integrated
package of components like Cassandra, Storm, Kafka and more. In addition, it
also brings together the proven open source technology stack with Hadoop and
NoSQL to provide massive scalability, dynamic data pipelines, and a visual
designer for rapid application development.</p>
+<p>
+Through StreamAnalytix, the users can ingest, store and analyze millions of
events per second and discover exceptions, patterns, and trends through live
dashboards. It also provides seamless integration with indexing store
(ElasticSearch) and NoSQL database (HBase, Cassandra, and Oracle NoSQL) for
writing data in real-time. With the use of Storm, the product delivers high
business value solutions such as log analytics, streaming ETL, deep social
listening, Real-time marketing, business process acceleration and predictive
maintenance.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.akazoo.com/en">Akazoo</a>
+</td>
+<td>
+<p>
+Akazoo is a platform providing music streaming services. Storm is the
backbone of all our real-time analytical processing. We use it for tracking and
analyzing application events and for various other stuff, including
recommendations and parallel task execution.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mapillary.com">Mapillary</a>
+</td>
+<td>
+<p>
+At Mapillary we use storm for a wide variety of tasks. Having a system which
is 100% based on kafka input storm and trident makes reasoning about our data a
breeze.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.gutscheinrausch.de/">Gutscheinrausch.de</a>
+</td>
+<td>
+<p>
+We recently upgraded our existing IT infrastructure, using Storm as one of our
main tools.
+Each day we collect sales, clicks, visits and various ecommerce metrics from
various different systems (webpages, affiliate reportings, networks,
tracking-scripts etc). We process this continually generated data using Storm
before entering it into the backend systems for further use.
+</p>
+<p>
+Using Storm we were able to decouple our heterogeneous frontend-systems from
our backends and take load off the data warehouse applications by inputting
pre-processed data. This way we can easy collect and process all data and then
do realtime OLAP queries using our propietary data warehouse technology.
+</p>
+<p>
+We are mostly impressed by the high speed, low maintenance approach Storm has
provided us with. Also being able to easily scale up the system using more
machines is a big plus. Since we're a small team it allows us to focus more on
our core business instead of the underlying technology. You could say it has
taken our hearts by storm!
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.appriver.com">AppRiver</a>
+</td>
+<td>
+<p>
+We are using Storm to track internet threats from varied sources around the
web. It is always fast and reliable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mercadolibre.com/">MercadoLibre</a>
+</td>
+<td>
+</td>
+</tr>
+
+
+</table>
Modified: storm/branches/bobby-versioned-site/getting-help.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/getting-help.md?rev=1735516&r1=1735515&r2=1735516&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/getting-help.md (original)
+++ storm/branches/bobby-versioned-site/getting-help.md Thu Mar 17 22:48:32 2016
@@ -1,7 +1,8 @@
---
layout: default
-title: Getting help
+title: Documentation
---
+### Getting help
__NOTE:__ The google groups account [email protected] is now
officially deprecated in favor of the Apache-hosted user/dev mailing lists.
Modified: storm/branches/bobby-versioned-site/index.html
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/index.html?rev=1735516&r1=1735515&r2=1735516&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/index.html (original)
+++ storm/branches/bobby-versioned-site/index.html Thu Mar 17 22:48:32 2016
@@ -90,7 +90,7 @@ title: Apache Storm
<a href="http://www.wego.com/"><img
src="images/logos/wego.jpg" class="img-responsive"></a>
</div>
<div>
- <a href="/documentation/Powered-By.html" target="blank"
class="pull-right" style="font-size: 18px;">and many others</a>
+ <a href="/Powered-By.html" target="blank" class="pull-right"
style="font-size: 18px;">and many others</a>
</div>
</div>
</div>
Added: storm/branches/bobby-versioned-site/releases/0.10.0/flux.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/flux.md?rev=1735516&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/flux.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/flux.md Thu Mar 17
22:48:32 2016
@@ -0,0 +1,836 @@
+---
+title: Flux
+layout: documentation
+documentation: true
+version: v0.10.0
+---
+
+A framework for creating and deploying Apache Storm streaming computations
with less friction.
+
+## Definition
+**flux** |flÉks| _noun_
+
+1. The action or process of flowing or flowing out
+2. Continuous change
+3. In physics, the rate of flow of a fluid, radiant energy, or particles
across a given area
+4. A substance mixed with a solid to lower its melting point
+
+## Rationale
+Bad things happen when configuration is hard-coded. No one should have to
recompile or repackage an application in
+order to change configuration.
+
+## About
+Flux is a framework and set of utilities that make defining and deploying
Apache Storm topologies less painful and
+deveoper-intensive.
+
+Have you ever found yourself repeating this pattern?:
+
+```java
+
+public static void main(String[] args) throws Exception {
+ // logic to determine if we're running locally or not...
+ // create necessary config options...
+ boolean runLocal = shouldRunLocal();
+ if(runLocal){
+ LocalCluster cluster = new LocalCluster();
+ cluster.submitTopology(name, conf, topology);
+ } else {
+ StormSubmitter.submitTopology(name, conf, topology);
+ }
+}
+```
+
+Wouldn't something like this be easier:
+
+```bash
+storm jar mytopology.jar org.apache.storm.flux.Flux --local config.yaml
+```
+
+or:
+
+```bash
+storm jar mytopology.jar org.apache.storm.flux.Flux --remote config.yaml
+```
+
+Another pain point often mentioned is the fact that the wiring for a Topology
graph is often tied up in Java code,
+and that any changes require recompilation and repackaging of the topology jar
file. Flux aims to alleviate that
+pain by allowing you to package all your Storm components in a single jar, and
use an external text file to define
+the layout and configuration of your topologies.
+
+## Features
+
+ * Easily configure and deploy Storm topologies (Both Storm core and
Microbatch API) without embedding configuration
+ in your topology code
+ * Support for existing topology code (see below)
+ * Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL
+ * YAML DSL support for most Storm components (storm-kafka, storm-hdfs,
storm-hbase, etc.)
+ * Convenient support for multi-lang components
+ * External property substitution/filtering for easily switching between
configurations/environments (similar to Maven-style
+ `${variable.name}` substitution)
+
+## Usage
+
+To use Flux, add it as a dependency and package all your Storm components in a
fat jar, then create a YAML document
+to define your topology (see below for YAML configuration options).
+
+### Building from Source
+The easiest way to use Flux, is to add it as a Maven dependency in you project
as described below.
+
+If you would like to build Flux from source and run the unit/integration
tests, you will need the following installed
+on your system:
+
+* Python 2.6.x or later
+* Node.js 0.10.x or later
+
+#### Building with unit tests enabled:
+
+```
+mvn clean install
+```
+
+#### Building with unit tests disabled:
+If you would like to build Flux without installing Python or Node.js you can
simply skip the unit tests:
+
+```
+mvn clean install -DskipTests=true
+```
+
+Note that if you plan on using Flux to deploy topologies to a remote cluster,
you will still need to have Python
+installed since it is required by Apache Storm.
+
+
+#### Building with integration tests enabled:
+
+```
+mvn clean install -DskipIntegration=false
+```
+
+
+### Packaging with Maven
+To enable Flux for your Storm components, you need to add it as a dependency
such that it's included in the Storm
+topology jar. This can be accomplished with the Maven shade plugin (preferred)
or the Maven assembly plugin (not
+recommended).
+
+#### Flux Maven Dependency
+The current version of Flux is available in Maven Central at the following
coordinates:
+```xml
+<dependency>
+ <groupId>org.apache.storm</groupId>
+ <artifactId>flux-core</artifactId>
+ <version>${storm.version}</version>
+</dependency>
+```
+
+#### Creating a Flux-Enabled Topology JAR
+The example below illustrates Flux usage with the Maven shade plugin:
+
+ ```xml
+<!-- include Flux and user dependencies in the shaded jar -->
+<dependencies>
+ <!-- Flux include -->
+ <dependency>
+ <groupId>org.apache.storm</groupId>
+ <artifactId>flux-core</artifactId>
+ <version>${storm.version}</version>
+ </dependency>
+
+ <!-- add user dependencies here... -->
+
+</dependencies>
+<!-- create a fat jar that includes all dependencies -->
+<build>
+ <plugins>
+ <plugin>
+ <groupId>org.apache.maven.plugins</groupId>
+ <artifactId>maven-shade-plugin</artifactId>
+ <version>1.4</version>
+ <configuration>
+ <createDependencyReducedPom>true</createDependencyReducedPom>
+ </configuration>
+ <executions>
+ <execution>
+ <phase>package</phase>
+ <goals>
+ <goal>shade</goal>
+ </goals>
+ <configuration>
+ <transformers>
+ <transformer
+
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
+ <transformer
+
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
+
<mainClass>org.apache.storm.flux.Flux</mainClass>
+ </transformer>
+ </transformers>
+ </configuration>
+ </execution>
+ </executions>
+ </plugin>
+ </plugins>
+</build>
+ ```
+
+### Deploying and Running a Flux Topology
+Once your topology components are packaged with the Flux dependency, you can
run different topologies either locally
+or remotely using the `storm jar` command. For example, if your fat jar is
named `myTopology-0.1.0-SNAPSHOT.jar` you
+could run it locally with the command:
+
+
+```bash
+storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local
my_config.yaml
+
+```
+
+### Command line options
+```
+usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux
+ [options] <topology-config.yaml>
+ -d,--dry-run Do not run or deploy the topology. Just
+ build, validate, and print information about
+ the topology.
+ -e,--env-filter Perform environment variable substitution.
+ Replace keys identified with `${ENV-[NAME]}`
+ will be replaced with the corresponding
+ `NAME` environment value
+ -f,--filter <file> Perform property substitution. Use the
+ specified file as a source of properties,
+ and replace keys identified with {$[property
+ name]} with the value defined in the
+ properties file.
+ -i,--inactive Deploy the topology, but do not activate it.
+ -l,--local Run the topology in local mode.
+ -n,--no-splash Suppress the printing of the splash screen.
+ -q,--no-detail Suppress the printing of topology details.
+ -r,--remote Deploy the topology to a remote cluster.
+ -R,--resource Treat the supplied path as a classpath
+ resource instead of a file.
+ -s,--sleep <ms> When running locally, the amount of time to
+ sleep (in ms.) before killing the topology
+ and shutting down the local cluster.
+ -z,--zookeeper <host:port> When running in local mode, use the
+ ZooKeeper at the specified <host>:<port>
+ instead of the in-process ZooKeeper.
+ (requires Storm 0.9.3 or later)
+```
+
+**NOTE:** Flux tries to avoid command line switch collision with the `storm`
command, and allows any other command line
+switches to pass through to the `storm` command.
+
+For example, you can use the `storm` command switch `-c` to override a
topology configuration property. The following
+example command will run Flux and override the `nimbus.seeds` configuration:
+
+```bash
+storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote
my_config.yaml -c 'nimbus.seeds=["localhost"]'
+```
+
+### Sample output
+```
+âââââââââââ âââ ââââââ âââ
+âââââââââââ âââ
âââââââââââ
+ââââââ âââ âââ âââ ââââââ
+ââââââ âââ âââ âââ ââââââ
+âââ âââââââââââââââââââââ
âââ
+âââ ââââââââ âââââââ âââ
âââ
++- Apache Storm -+
++- data FLow User eXperience -+
+Version: 0.3.0
+Parsing file: /Users/hsimpson/Projects/donut_domination/storm/shell_test.yaml
+---------- TOPOLOGY DETAILS ----------
+Name: shell-topology
+--------------- SPOUTS ---------------
+sentence-spout[1](org.apache.storm.flux.spouts.GenericShellSpout)
+---------------- BOLTS ---------------
+splitsentence[1](org.apache.storm.flux.bolts.GenericShellBolt)
+log[1](org.apache.storm.flux.wrappers.bolts.LogInfoBolt)
+count[1](backtype.storm.testing.TestWordCounter)
+--------------- STREAMS ---------------
+sentence-spout --SHUFFLE--> splitsentence
+splitsentence --FIELDS--> count
+count --SHUFFLE--> log
+--------------------------------------
+Submitting topology: 'shell-topology' to remote cluster...
+```
+
+## YAML Configuration
+Flux topologies are defined in a YAML file that describes a topology. A Flux
topology
+definition consists of the following:
+
+ 1. A topology name
+ 2. A list of topology "components" (named Java objects that will be made
available in the environment)
+ 3. **EITHER** (A DSL topology definition):
+ * A list of spouts, each identified by a unique ID
+ * A list of bolts, each identified by a unique ID
+ * A list of "stream" objects representing a flow of tuples between
spouts and bolts
+ 4. **OR** (A JVM class that can produce a
`backtype.storm.generated.StormTopology` instance:
+ * A `topologySource` definition.
+
+
+
+For example, here is a simple definition of a wordcount topology using the
YAML DSL:
+
+```yaml
+name: "yaml-topology"
+config:
+ topology.workers: 1
+
+# spout definitions
+spouts:
+ - id: "spout-1"
+ className: "backtype.storm.testing.TestWordSpout"
+ parallelism: 1
+
+# bolt definitions
+bolts:
+ - id: "bolt-1"
+ className: "backtype.storm.testing.TestWordCounter"
+ parallelism: 1
+ - id: "bolt-2"
+ className: "org.apache.storm.flux.wrappers.bolts.LogInfoBolt"
+ parallelism: 1
+
+#stream definitions
+streams:
+ - name: "spout-1 --> bolt-1" # name isn't used (placeholder for logging, UI,
etc.)
+ from: "spout-1"
+ to: "bolt-1"
+ grouping:
+ type: FIELDS
+ args: ["word"]
+
+ - name: "bolt-1 --> bolt2"
+ from: "bolt-1"
+ to: "bolt-2"
+ grouping:
+ type: SHUFFLE
+
+
+```
+## Property Substitution/Filtering
+It's common for developers to want to easily switch between configurations,
for example switching deployment between
+a development environment and a production environment. This can be
accomplished by using separate YAML configuration
+files, but that approach would lead to unnecessary duplication, especially in
situations where the Storm topology
+does not change, but configuration settings such as host names, ports, and
parallelism paramters do.
+
+For this case, Flux offers properties filtering to allow you two externalize
values to a `.properties` file and have
+them substituted before the `.yaml` file is parsed.
+
+To enable property filtering, use the `--filter` command line option and
specify a `.properties` file. For example,
+if you invoked flux like so:
+
+```bash
+storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local
my_config.yaml --filter dev.properties
+```
+With the following `dev.properties` file:
+
+```properties
+kafka.zookeeper.hosts: localhost:2181
+```
+
+You would then be able to reference those properties by key in your `.yaml`
file using `${}` syntax:
+
+```yaml
+ - id: "zkHosts"
+ className: "storm.kafka.ZkHosts"
+ constructorArgs:
+ - "${kafka.zookeeper.hosts}"
+```
+
+In this case, Flux would replace `${kafka.zookeeper.hosts}` with
`localhost:2181` before parsing the YAML contents.
+
+### Environment Variable Substitution/Filtering
+Flux also allows environment variable substitution. For example, if an
environment variable named `ZK_HOSTS` if defined,
+you can reference it in a Flux YAML file with the following syntax:
+
+```
+${ENV-ZK_HOSTS}
+```
+
+## Components
+Components are essentially named object instances that are made available as
configuration options for spouts and
+bolts. If you are familiar with the Spring framework, components are roughly
analagous to Spring beans.
+
+Every component is identified, at a minimum, by a unique identifier (String)
and a class name (String). For example,
+the following will make an instance of the `storm.kafka.StringScheme` class
available as a reference under the key
+`"stringScheme"` . This assumes the `storm.kafka.StringScheme` has a default
constructor.
+
+```yaml
+components:
+ - id: "stringScheme"
+ className: "storm.kafka.StringScheme"
+```
+
+### Contructor Arguments, References, Properties and Configuration Methods
+
+####Constructor Arguments
+Arguments to a class constructor can be configured by adding a
`contructorArgs` element to a components.
+`constructorArgs` is a list of objects that will be passed to the class'
constructor. The following example creates an
+object by calling the constructor that takes a single string as an argument:
+
+```yaml
+ - id: "zkHosts"
+ className: "storm.kafka.ZkHosts"
+ constructorArgs:
+ - "localhost:2181"
+```
+
+####References
+Each component instance is identified by a unique id that allows it to be
used/reused by other components. To
+reference an existing component, you specify the id of the component with the
`ref` tag.
+
+In the following example, a component with the id `"stringScheme"` is created,
and later referenced, as a an argument
+to another component's constructor:
+
+```yaml
+components:
+ - id: "stringScheme"
+ className: "storm.kafka.StringScheme"
+
+ - id: "stringMultiScheme"
+ className: "backtype.storm.spout.SchemeAsMultiScheme"
+ constructorArgs:
+ - ref: "stringScheme" # component with id "stringScheme" must be
declared above.
+```
+**N.B.:** References can only be used after (below) the object they point to
has been declared.
+
+####Properties
+In addition to calling constructors with different arguments, Flux also allows
you to configure components using
+JavaBean-like setter methods and fields declared as `public`:
+
+```yaml
+ - id: "spoutConfig"
+ className: "storm.kafka.SpoutConfig"
+ constructorArgs:
+ # brokerHosts
+ - ref: "zkHosts"
+ # topic
+ - "myKafkaTopic"
+ # zkRoot
+ - "/kafkaSpout"
+ # id
+ - "myId"
+ properties:
+ - name: "forceFromStart"
+ value: true
+ - name: "scheme"
+ ref: "stringMultiScheme"
+```
+
+In the example above, the `properties` declaration will cause Flux to look for
a public method in the `SpoutConfig` with
+the signature `setForceFromStart(boolean b)` and attempt to invoke it. If a
setter method is not found, Flux will then
+look for a public instance variable with the name `forceFromStart` and attempt
to set its value.
+
+References may also be used as property values.
+
+####Configuration Methods
+Conceptually, configuration methods are similar to Properties and Constructor
Args -- they allow you to invoke an
+arbitrary method on an object after it is constructed. Configuration methods
are useful for working with classes that
+don't expose JavaBean methods or have constructors that can fully configure
the object. Common examples include classes
+that use the builder pattern for configuration/composition.
+
+The following YAML example creates a bolt and configures it by calling several
methods:
+
+```yaml
+bolts:
+ - id: "bolt-1"
+ className: "org.apache.storm.flux.test.TestBolt"
+ parallelism: 1
+ configMethods:
+ - name: "withFoo"
+ args:
+ - "foo"
+ - name: "withBar"
+ args:
+ - "bar"
+ - name: "withFooBar"
+ args:
+ - "foo"
+ - "bar"
+```
+
+The signatures of the corresponding methods are as follows:
+
+```java
+ public void withFoo(String foo);
+ public void withBar(String bar);
+ public void withFooBar(String foo, String bar);
+```
+
+Arguments passed to configuration methods work much the same way as
constructor arguments, and support references as
+well.
+
+### Using Java `enum`s in Contructor Arguments, References, Properties and
Configuration Methods
+You can easily use Java `enum` values as arguments in a Flux YAML file, simply
by referencing the name of the `enum`.
+
+For example, [Storm's HDFS module]() includes the following `enum` definition
(simplified for brevity):
+
+```java
+public static enum Units {
+ KB, MB, GB, TB
+}
+```
+
+And the `org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy` class has
the following constructor:
+
+```java
+public FileSizeRotationPolicy(float count, Units units)
+
+```
+The following Flux `component` definition could be used to call the
constructor:
+
+```yaml
+ - id: "rotationPolicy"
+ className: "org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy"
+ constructorArgs:
+ - 5.0
+ - MB
+```
+
+The above definition is functionally equivalent to the following Java code:
+
+```java
+// rotate files when they reach 5MB
+FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB);
+```
+
+## Topology Config
+The `config` section is simply a map of Storm topology configuration
parameters that will be passed to the
+`backtype.storm.StormSubmitter` as an instance of the `backtype.storm.Config`
class:
+
+```yaml
+config:
+ topology.workers: 4
+ topology.max.spout.pending: 1000
+ topology.message.timeout.secs: 30
+```
+
+# Existing Topologies
+If you have existing Storm topologies, you can still use Flux to
deploy/run/test them. This feature allows you to
+leverage Flux Constructor Arguments, References, Properties, and Topology
Config declarations for existing topology
+classes.
+
+The easiest way to use an existing topology class is to define
+a `getTopology()` instance method with one of the following signatures:
+
+```java
+public StormTopology getTopology(Map<String, Object> config)
+```
+or:
+
+```java
+public StormTopology getTopology(Config config)
+```
+
+You could then use the following YAML to configure your topology:
+
+```yaml
+name: "existing-topology"
+topologySource:
+ className: "org.apache.storm.flux.test.SimpleTopology"
+```
+
+If the class you would like to use as a topology source has a different method
name (i.e. not `getTopology`), you can
+override it:
+
+```yaml
+name: "existing-topology"
+topologySource:
+ className: "org.apache.storm.flux.test.SimpleTopology"
+ methodName: "getTopologyWithDifferentMethodName"
+```
+
+__N.B.:__ The specified method must accept a single argument of type
`java.util.Map<String, Object>` or
+`backtype.storm.Config`, and return a `backtype.storm.generated.StormTopology`
object.
+
+# YAML DSL
+## Spouts and Bolts
+Spout and Bolts are configured in their own respective section of the YAML
configuration. Spout and Bolt definitions
+are extensions to the `component` definition that add a `parallelism`
parameter that sets the parallelism for a
+component when the topology is deployed.
+
+Because spout and bolt definitions extend `component` they support constructor
arguments, references, and properties as
+well.
+
+Shell spout example:
+
+```yaml
+spouts:
+ - id: "sentence-spout"
+ className: "org.apache.storm.flux.spouts.GenericShellSpout"
+ # shell spout constructor takes 2 arguments: String[], String[]
+ constructorArgs:
+ # command line
+ - ["node", "randomsentence.js"]
+ # output fields
+ - ["word"]
+ parallelism: 1
+```
+
+Kafka spout example:
+
+```yaml
+components:
+ - id: "stringScheme"
+ className: "storm.kafka.StringScheme"
+
+ - id: "stringMultiScheme"
+ className: "backtype.storm.spout.SchemeAsMultiScheme"
+ constructorArgs:
+ - ref: "stringScheme"
+
+ - id: "zkHosts"
+ className: "storm.kafka.ZkHosts"
+ constructorArgs:
+ - "localhost:2181"
+
+# Alternative kafka config
+# - id: "kafkaConfig"
+# className: "storm.kafka.KafkaConfig"
+# constructorArgs:
+# # brokerHosts
+# - ref: "zkHosts"
+# # topic
+# - "myKafkaTopic"
+# # clientId (optional)
+# - "myKafkaClientId"
+
+ - id: "spoutConfig"
+ className: "storm.kafka.SpoutConfig"
+ constructorArgs:
+ # brokerHosts
+ - ref: "zkHosts"
+ # topic
+ - "myKafkaTopic"
+ # zkRoot
+ - "/kafkaSpout"
+ # id
+ - "myId"
+ properties:
+ - name: "forceFromStart"
+ value: true
+ - name: "scheme"
+ ref: "stringMultiScheme"
+
+config:
+ topology.workers: 1
+
+# spout definitions
+spouts:
+ - id: "kafka-spout"
+ className: "storm.kafka.KafkaSpout"
+ constructorArgs:
+ - ref: "spoutConfig"
+
+```
+
+Bolt Examples:
+
+```yaml
+# bolt definitions
+bolts:
+ - id: "splitsentence"
+ className: "org.apache.storm.flux.bolts.GenericShellBolt"
+ constructorArgs:
+ # command line
+ - ["python", "splitsentence.py"]
+ # output fields
+ - ["word"]
+ parallelism: 1
+ # ...
+
+ - id: "log"
+ className: "org.apache.storm.flux.wrappers.bolts.LogInfoBolt"
+ parallelism: 1
+ # ...
+
+ - id: "count"
+ className: "backtype.storm.testing.TestWordCounter"
+ parallelism: 1
+ # ...
+```
+## Streams and Stream Groupings
+Streams in Flux are represented as a list of connections (Graph edges, data
flow, etc.) between the Spouts and Bolts in
+a topology, with an associated Grouping definition.
+
+A Stream definition has the following properties:
+
+**`name`:** A name for the connection (optional, currently unused)
+
+**`from`:** The `id` of a Spout or Bolt that is the source (publisher)
+
+**`to`:** The `id` of a Spout or Bolt that is the destination (subscriber)
+
+**`grouping`:** The stream grouping definition for the Stream
+
+A Grouping definition has the following properties:
+
+**`type`:** The type of grouping. One of
`ALL`,`CUSTOM`,`DIRECT`,`SHUFFLE`,`LOCAL_OR_SHUFFLE`,`FIELDS`,`GLOBAL`, or
`NONE`.
+
+**`streamId`:** The Storm stream ID (Optional. If unspecified will use the
default stream)
+
+**`args`:** For the `FIELDS` grouping, a list of field names.
+
+**`customClass`** For the `CUSTOM` grouping, a definition of custom grouping
class instance
+
+The `streams` definition example below sets up a topology with the following
wiring:
+
+```
+ kafka-spout --> splitsentence --> count --> log
+```
+
+
+```yaml
+#stream definitions
+# stream definitions define connections between spouts and bolts.
+# note that such connections can be cyclical
+# custom stream groupings are also supported
+
+streams:
+ - name: "kafka --> split" # name isn't used (placeholder for logging, UI,
etc.)
+ from: "kafka-spout"
+ to: "splitsentence"
+ grouping:
+ type: SHUFFLE
+
+ - name: "split --> count"
+ from: "splitsentence"
+ to: "count"
+ grouping:
+ type: FIELDS
+ args: ["word"]
+
+ - name: "count --> log"
+ from: "count"
+ to: "log"
+ grouping:
+ type: SHUFFLE
+```
+
+### Custom Stream Groupings
+Custom stream groupings are defined by setting the grouping type to `CUSTOM`
and defining a `customClass` parameter
+that tells Flux how to instantiate the custom class. The `customClass`
definition extends `component`, so it supports
+constructor arguments, references, and properties as well.
+
+The example below creates a Stream with an instance of the
`backtype.storm.testing.NGrouping` custom stream grouping
+class.
+
+```yaml
+ - name: "bolt-1 --> bolt2"
+ from: "bolt-1"
+ to: "bolt-2"
+ grouping:
+ type: CUSTOM
+ customClass:
+ className: "backtype.storm.testing.NGrouping"
+ constructorArgs:
+ - 1
+```
+
+## Includes and Overrides
+Flux allows you to include the contents of other YAML files, and have them
treated as though they were defined in the
+same file. Includes may be either files, or classpath resources.
+
+Includes are specified as a list of maps:
+
+```yaml
+includes:
+ - resource: false
+ file: "src/test/resources/configs/shell_test.yaml"
+ override: false
+```
+
+If the `resource` property is set to `true`, the include will be loaded as a
classpath resource from the value of the
+`file` attribute, otherwise it will be treated as a regular file.
+
+The `override` property controls how includes affect the values defined in the
current file. If `override` is set to
+`true`, values in the included file will replace values in the current file
being parsed. If `override` is set to
+`false`, values in the current file being parsed will take precedence, and the
parser will refuse to replace them.
+
+**N.B.:** Includes are not yet recursive. Includes from included files will be
ignored.
+
+
+## Basic Word Count Example
+
+This example uses a spout implemented in JavaScript, a bolt implemented in
Python, and a bolt implemented in Java
+
+Topology YAML config:
+
+```yaml
+---
+name: "shell-topology"
+config:
+ topology.workers: 1
+
+# spout definitions
+spouts:
+ - id: "sentence-spout"
+ className: "org.apache.storm.flux.spouts.GenericShellSpout"
+ # shell spout constructor takes 2 arguments: String[], String[]
+ constructorArgs:
+ # command line
+ - ["node", "randomsentence.js"]
+ # output fields
+ - ["word"]
+ parallelism: 1
+
+# bolt definitions
+bolts:
+ - id: "splitsentence"
+ className: "org.apache.storm.flux.bolts.GenericShellBolt"
+ constructorArgs:
+ # command line
+ - ["python", "splitsentence.py"]
+ # output fields
+ - ["word"]
+ parallelism: 1
+
+ - id: "log"
+ className: "org.apache.storm.flux.wrappers.bolts.LogInfoBolt"
+ parallelism: 1
+
+ - id: "count"
+ className: "backtype.storm.testing.TestWordCounter"
+ parallelism: 1
+
+#stream definitions
+# stream definitions define connections between spouts and bolts.
+# note that such connections can be cyclical
+# custom stream groupings are also supported
+
+streams:
+ - name: "spout --> split" # name isn't used (placeholder for logging, UI,
etc.)
+ from: "sentence-spout"
+ to: "splitsentence"
+ grouping:
+ type: SHUFFLE
+
+ - name: "split --> count"
+ from: "splitsentence"
+ to: "count"
+ grouping:
+ type: FIELDS
+ args: ["word"]
+
+ - name: "count --> log"
+ from: "count"
+ to: "log"
+ grouping:
+ type: SHUFFLE
+```
+
+
+## Micro-Batching (Trident) API Support
+Currenty, the Flux YAML DSL only supports the Core Storm API, but support for
Storm's micro-batching API is planned.
+
+To use Flux with a Trident topology, define a topology getter method and
reference it in your YAML config:
+
+```yaml
+name: "my-trident-topology"
+
+config:
+ topology.workers: 1
+
+topologySource:
+ className: "org.apache.storm.flux.test.TridentTopologySource"
+ # Flux will look for "getTopology", this will override that.
+ methodName: "getTopologyWithDifferentMethodName"
+```
Added: storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md?rev=1735516&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md
(added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md Thu
Mar 17 22:48:32 2016
@@ -0,0 +1,41 @@
+---
+title: Azue Event Hubs Integration
+layout: documentation
+documentation: true
+version: v0.10.0
+---
+
+Storm spout and bolt implementation for Microsoft Azure Eventhubs
+
+### build ###
+ mvn clean package
+
+### run sample topology ###
+To run the sample topology, you need to modify the config.properties file with
+the eventhubs configurations. Here is an example:
+
+ eventhubspout.username = [username: policy name in EventHubs Portal]
+ eventhubspout.password = [password: shared access key in EventHubs
Portal]
+ eventhubspout.namespace = [namespace]
+ eventhubspout.entitypath = [entitypath]
+ eventhubspout.partitions.count = [partitioncount]
+
+ # if not provided, will use storm's zookeeper settings
+ #
zookeeper.connectionstring=zookeeper0:2181,zookeeper1:2181,zookeeper2:2181
+
+ eventhubspout.checkpoint.interval = 10
+ eventhub.receiver.credits = 1024
+
+Then you can use storm.cmd to submit the sample topology:
+ storm jar {jarfile} com.microsoft.eventhubs.samples.EventCount
{topologyname} {spoutconffile}
+ where the {jarfile} should be:
eventhubs-storm-spout-{version}-jar-with-dependencies.jar
+
+### Run EventHubSendClient ###
+We have included a simple EventHubs send client for testing purpose. You can
run the client like this:
+ java -cp
.\target\eventhubs-storm-spout-{version}-jar-with-dependencies.jar
com.microsoft.eventhubs.client.EventHubSendClient
+ [username] [password] [entityPath] [partitionId] [messageSize]
[messageCount]
+If you want to send messages to all partitions, use "-1" as partitionId.
+
+### Windows Azure Eventhubs ###
+ http://azure.microsoft.com/en-us/services/event-hubs/
+