svn commit: r1735516 [1/2] - in /storm/branches/bobby-versioned-site: ./ releases/0.10.0/

bobby Sat, 19 Mar 2016 03:58:25 -0700

Author: bobby
Date: Thu Mar 17 22:48:32 2016
New Revision: 1735516

URL: http://svn.apache.org/viewvc?rev=1735516&view=rev
Log:
Pulled in some more files from asf_site in git


Added:
    storm/branches/bobby-versioned-site/Powered-By.md
    storm/branches/bobby-versioned-site/releases/0.10.0/flux.md
    storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md
    storm/branches/bobby-versioned-site/releases/0.10.0/storm-hbase.md
    storm/branches/bobby-versioned-site/releases/0.10.0/storm-hdfs.md
    storm/branches/bobby-versioned-site/releases/0.10.0/storm-hive.md
    storm/branches/bobby-versioned-site/releases/0.10.0/storm-jdbc.md
    storm/branches/bobby-versioned-site/releases/0.10.0/storm-kafka.md
    storm/branches/bobby-versioned-site/releases/0.10.0/storm-redis.md
Modified:
    storm/branches/bobby-versioned-site/getting-help.md
    storm/branches/bobby-versioned-site/index.html

Added: storm/branches/bobby-versioned-site/Powered-By.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/Powered-By.md?rev=1735516&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/Powered-By.md (added)
+++ storm/branches/bobby-versioned-site/Powered-By.md Thu Mar 17 22:48:32 2016
@@ -0,0 +1,1040 @@
+---
+title: Companies Using Apache Storm
+layout: documentation
+documentation: true
+---
+Want to be added to this page? Send an email 
[here](mailto:[email protected]).
+
+<table class="table table-striped">
+
+<tr>
+<td>
+<a href="http://groupon.com";>Groupon</a>
+</td>
+<td>
+<p>
+At Groupon we use Storm to build real-time data integration systems. Storm 
helps us analyze, clean, normalize, and resolve large amounts of non-unique 
data points with low latency and high throughput.
+</p>
+</td>
+</tr>
+
+<tr>
+<td><a href="http://www.weather.com/";>The Weather Channel</a></td>
+<td>
+<p>At Weather Channel we use several Storm topologies to ingest and persist 
weather data. Each topology is responsible for fetching one dataset from an 
internal or external network (the Internet), reshaping the records for use by 
our company, and persisting the records to relational databases. It is 
particularly useful to have an automatic mechanism for repeating attempts to 
download and manipulate the data when there is a hiccup.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.fullcontact.com/";>FullContact</a>
+</td>
+<td>
+<p>
+At FullContact we currently use Storm as the backbone of the system which 
synchronizes our Cloud Address Book with third party services such as Google 
Contacts and Salesforce. We also use it to provide real-time support for our 
contact graph analysis and federated contact search systems.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://twitter.com";>Twitter</a>
+</td>
+<td>
+<p>
+Storm powers a wide variety of Twitter systems, ranging in applications from 
discovery, realtime analytics, personalization, search, revenue optimization, 
and many more. Storm integrates with the rest of Twitter's infrastructure, 
including database systems (Cassandra, Memcached, etc), the messaging 
infrastructure, Mesos, and the monitoring/alerting systems. Storm's isolation 
scheduler makes it easy to use the same cluster both for production 
applications and in-development applications, and it provides a sane way to do 
capacity planning.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.com";>Yahoo!</a>
+</td>
+<td>
+<p>
+Yahoo! is developing a next generation platform that enables the convergence 
of big-data and low-latency processing. While Hadoop is our primary technology 
for batch processing, Storm empowers stream/micro-batch processing of user 
events, content feeds, and application logs. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.co.jp/";>Yahoo! JAPAN</a>
+</td>
+<td>
+<p>
+Yahoo! JAPAN is a leading web portal in Japan. Storm applications are 
processing various streaming data such as logs or social data. We use Storm to 
feed contents, monitor systems, detect trending topics, and crawl on websites.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.webmd.com";>WebMD</a>
+</td>
+<td>
+<p>
+We use Storm to power our Medscape Medpulse mobile application which allow 
medical professionals to follow important medical trends with Medscape's 
curated Today on Twitter feed and selection of blogs. Storm topology is 
capturing and processing tweets with twitter streaming API, enhance tweets with 
metadata and images, do real time NLP and execute several business rules. Storm 
also monitors selection of blogs in order to give our customers real-time 
updates.  We also use Storm for internal data pipelines to do ETL and for our 
internal marketing platform where time and freshness are essential.
+</p>
+<p>
+We use storm to power our search indexing process.  We continue to discover 
new use cases for storm and it became one of the core component in our 
technology stack.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.spotify.com";>Spotify</a>
+</td>
+<td>
+<p>
+Spotify serves streaming music to over 10 million subscribers and 40 million 
active users. Storm powers a wide range of real-time features at Spotify, 
including music recommendation, monitoring, analytics, and ads targeting. 
Together with Kafka, memcached, Cassandra, and netty-zmtp based messaging, 
Storm enables us to build low-latency fault-tolerant distributed systems with 
ease.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.infochimps.com";>Infochimps</a>
+</td>
+<td>
+<p>
+Infochimps uses Storm as part of its Big Data Enterprise Cloud. Specifically, 
it uses Storm as the basis for one of three of its cloud data services - 
namely, Data Delivery Services (DDS), which uses Storm to provide a 
fault-tolerant and linearly scalable enterprise data collection, transport, and 
complex in-stream processing cloud service. 
+</p>
+
+<p>
+In much the same way that Hadoop provides batch ETL and large-scale batch 
analytical processing, the Data Delivery Service provides real-time ETL and 
large-scale real-time analytical processing â the perfect complement to 
Hadoop (or in some cases, what you needed instead of Hadoop).
+</p>
+
+<p>
+DDS uses both Storm and Kafka along with a host of additional technologies to 
provide an enterprise-class real-time stream processing solution with features 
including:
+</p>
+
+<ul>
+<li>
+Integration connections to any variety of data sources in a way that is robust 
yet as non-invasive
+</li>
+<li>
+Optimizations for highly scalable, reliable data import and distributed ETL 
(extract, transform, load), fulfilling data transport needs
+</li>
+<li>
+Developer tools for rapid development of decorators, which perform the 
real-time stream processing
+</li>
+<li>
+Guaranteed delivery framework and data failover snapshots to send processed 
data to analytics systems, databases, file systems, and applications with 
extreme reliability
+</li>
+<li>
+Rapid solution development and deployment, along with our expert Big Data 
methodology and best practices
+</li>
+</ul>
+
+<p>Infochimps has extensive experience in deploying its DDS to power 
large-scale clickstream web data flows, massive Twitter stream processes, 
Foursquare event processing, customer purchase data, product pricing data, and 
more.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://healthmarketscience.com/";>Health Market Science</a>
+</td>
+<td>
+<p>
+Health Market Science (HMS) provides data management as a service for the 
healthcare industry.  Storm is at the core of the HMS big data platform 
functioning as the data ingestion mechanism, which orchestrates the data flow 
across multiple persistence mechanisms that allow HMS to deliver Master Data 
Management (MDM) and analytics capabilities for wide range of healthcare needs: 
compliance, integrity, data quality, and operational decision support.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://www.verisigninc.com/";>Verisign</a>
+</td>
+<td>
+<p>
+Verisign, a global leader in domain names and Internet security, enables 
Internet navigation for many of the world's most recognized domain names and 
provides protection for enterprises around the world.  Ensuring the security, 
stability, and resiliency of key Internet infrastructure and services, 
including the .COM and .NET top level domains and two of the Internet's DNS 
root servers, is at the heart of Verisignâs mission.  Storm is a component of 
our data analytics stack that powers a variety of real-time applications.  One 
example is security monitoring where we are leveraging Storm to analyze the 
network telemetry data of our globally distributed infrastructure in order to 
detect and mitigate cyber attacks.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://cerner.com/";>Cerner</a>
+</td>
+<td>
+<p>
+Cerner is a leader in health care information technology. We have been using 
Storm since its release to process massive amounts of clinical data in 
real-time. Storm integrates well in our architecture, allowing us to quickly 
provide clinicians with the data they need to make medical decisions.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.aeris.com/";>Aeris Communications</a>
+</td>
+<td>
+<p>
+Aeris Communications has the only cellular network that was designed and built 
exclusively for machines. Our ability to provide scalable, reliable real-time 
analytics - powered by Storm - for machine to machine (M2M) communication 
offers immense value to our customers. We are using Storm in production since 
Q1 of 2013.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://flipboard.com/";>Flipboard</a>
+</td>
+<td>
+<p>
+Flipboard is the worldÊ¼s ï¬rst social magazine, a single place to keep up 
with everything  you care about and collect it in ways that let reï¬ect you. 
Inspired by the beauty and  ease of print media, Flipboard is designed so you 
can easily ï¬ip through news from around the world or stories from right at 
home, helping people ï¬nd the one thing that  can inform, entertain or even 
inspire them every day.
+</p>
+<p>
+We are using Storm across a wide range of our services from content search, to 
realtime analytics, to generating custom magazine feeds. We then integrate 
Storm across our infrastructure within systems like ElasticSearch, HBase, 
Hadoop and HDFS to create a highly scalable data platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.rubiconproject.com/";>Rubicon Project</a>
+</td>
+<td>
+<p>
+Storm is being used in production mode at the Rubicon Project to analyze the 
results of auctions of ad impressions on its RTB exchange as they occur.  It is 
currently processing around 650 million auction results in three data centers 
daily (with 3 separate Storm clusters). One simple application is identifying 
new creatives (ads) in real time for ad quality purposes.  A more sophisticated 
application is an "Inventory Valuation Service" that uses DRPC to return 
appraisals of new impressions before the auction takes place.  The appraisals 
are used for various optimization problems, such as deciding whether to auction 
an impression or skip it when close to maximum capacity.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ooyala.com/";>Ooyala</a>
+</td>
+<td>
+<p>
+Ooyala powers personalized multi-screen video experiences for some of the 
world's largest networks, brands and media companies. We provide all the 
technology and tools our customers need to manage, distribute and monetize 
digital video content at a global scale.
+</p>
+
+<p>
+At the core of our technology is an analytics engine that processes over two 
billion analytics events each day, derived from nearly 200 million viewers 
worldwide who watch video on an Ooyala-powered player.
+</p>
+
+<p>
+Ooyala will be deploying Storm in production to give our customers real-time 
streaming analytics on consumer viewing behavior and digital content trends. 
Storm enables us to rapidly mine one of the world's largest online video data 
sets to deliver up-to-the-minute business intelligence ranging from real-time 
viewing patterns to personalized content recommendations to dynamic programming 
guides and dozens of other insights for maximizing revenue with online video.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.taobao.com/index_global.php";>Taobao</a>
+</td>
+<td>
+<p>
+We make statistics of logs and extract useful information from the statistics 
in almost real-time with Storm.  Logs are read from Kafka-like persistent 
message queues into spouts, then processed and emitted over the topologies to 
compute desired results, which are then stored into distributed databases to be 
used elsewhere. Input log count varies from 2 millions to 1.5 billion every 
day, whose size is up to 2 terabytes among the projects.  The main challenge 
here is not only real-time processing of big data set; storing and persisting 
result is also a challenge and needs careful design and implementation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.alibaba.com/";>Alibaba</a>
+</td>
+<td>
+<p>
+Alibaba is the leading B2B e-commerce website in the world. We use storm to 
process the application log and the data change in database to supply realtime 
stats for data apps.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://iQIYI.COM";>iQIYI</a>
+</td>
+<td>
+<p>
+iQIYI is China`s largest online video platform. We are using Storm in our 
video advertising system, video recommendation system, log analysis system and 
many other scenarios. Now we have several standalone Storm clusters, and we 
also have Storm clusters on Mesos and on Yarn. Kafka-Storm integration and 
StormâHBase integration are quite common in our production environment. We 
have great interests in the new development about integration of Storm with 
other applications, like HBase, HDFS and Kafka.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.baidu.com/";>Baidu</a>
+</td>
+<td>
+<p>
+Baidu offers top searching technology services for websites, audio files and 
images, my group using Storm to process the searching logs to supply realtime 
stats for accounting pv, ar-time and so on.
+This project helps Ops to determine and monitor services status and can do 
great things in the future.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yelp.com/";>Yelp</a>
+</td>
+<td>
+<p>
+Yelp is using Storm with <a href="http://pyleus.org/";>Pyleus</a> to build a 
platform for developers to consume and process high throughput streams of data 
in real time. We have ongoing projects to use Storm and Pyleus for overhauling 
our internal application metrics pipeline, building an automated Python profile 
analysis system, and for general ETL operations. As its support for non-JVM 
components matures, we hope to make Storm the standard way of processing 
streaming data at Yelp.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.klout.com/";>Klout</a>
+</td>
+<td>
+<p>
+Klout helps everyone discover and be recognized for their influence by 
analyzing engagement with their content across social networks. Our analysis 
powers a daily Klout Score on a scale from 1-100 that shows how much influence 
social media users have and on what topics. We are using Storm to develop a 
realtime scoring and moments generation pipeline. Leveraging Storm's intuitive 
Trident abstraction we are able to create complex topologies which stream data 
from our network collectors via Kafka, processed and written out to HDFS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.loggly.com";>Loggly</a>
+</td>
+<td>
+<p>
+Loggly is the world's most popular cloud-based log management. Our cloud-based 
log management service helps DevOps and technical teams make sense of the the 
massive quantity of logs that are being produced by a growing number of 
cloud-centric applications â in order to solve operational problems faster. 
Storm is the heart of our ingestion pipeline where it filters, parses and 
analyses billions of log events all-day, every day and in real-time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://premise.is/";>premise.is</a>
+</td>
+<td>
+<p>
+We're building a platform for alternative, bottom-up, high-granularity 
econometric data capture, particularly targeting opaque developing economies 
(i.e., Argentina might lie about their inflation statistics, but their black 
market certainly doesn't). Basically we get to funnel hedge fund money into 
improving global economic transparency. 
+</p>
+<p>
+We've been using Storm in production since January 2012 as a streaming, 
time-indexed web crawl + extraction + machine learning-based semantic markup 
flow (about 60 physical nodes comparable to m1.large; generating a modest 
25GB/hr incremental). We wanted to have an end-to-end push-based system where 
new inputs get percolated through the topology in realtime and appear on the 
website, with no batch jobs required in between steps. Storm has been really 
integral to realizing this goal.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://www.wego.com/";>Wego</a>
+</td>
+<td>
+<p>About Wego, we are one of the worldâs most comprehensive travel 
metasearch engines, operating in 42 markets worldwide and used by millions of 
travelers to save time, pay less and travel more. We compare and display 
real-time flights, hotel pricing and availability from hundreds of leading 
travel sites from all around the world on one simple screen.</p>
+
+<p>At the heart of our products, Storm helps us to stream real-time 
meta-search data from our partners to end-users. Since data comes from many 
sources and with different timing, Storm topology concept naturally solves 
concurrency issues while helping us to continuously merge, slice and clean all 
the data. Additionally with a few tricks and tools provided in Storm we can 
easily apply incremental update to improve the flow our data (1-5GB/minute).</p>
+ 
+<p>With its simplicity, scalability, and flexibility, Storm does not only 
improve our current products but more importantly changes the way we work with 
data. Instead of keeping data static and crunching it once a while, we 
constantly move data all around, making use of different technologies, 
evaluating new ideas and building new products. We stream critical data to 
memory for fast access while continuously crunching and directing huge amount 
of data into various engines so that we can evaluate and make use of data 
instantly. Previously, this kind of system requires to setup and maintain quite 
a few things but with Storm all we need is half day of coding and a few seconds 
to deploy. In this sense we never think Storm is to serve our products but 
rather to evolve our products.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://rocketfuel.com/";>RocketFuel</a>
+</td>
+<td>
+<p>
+At Rocket Fuel (an ad network) we are building a real time platform on top of 
Storm which imitates the time critical workflows of existing Hadoop based ETL 
pipeline. This platform tracks impressions, clicks, conversions, bid requests 
etc. in real time. We are using Kafka as message queue. To start with we are 
pushing per minute aggregations directly to MySQL, but we plan to go finer than 
one minute and may bring HBase in to the picture to handle increased write 
load. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://quicklizard.com/";>QuickLizard</a>
+</td>
+<td>
+<p>
+QuickLizard builds solution for automated pricing for companies that have many 
products in their lists. Prices are influenced by multiple factors internal and 
external to company.
+</p>
+
+<p>
+Currently we use Storm to choose products that need to be priced. We get real 
time stream of events from client site and filters them to get much more light 
stream of products that need to be processed by our procedures to get price 
recommendation.
+</p>
+
+<p>
+In plans: use Storm also for real time data mining model calculation that 
should match products described on competitor sites to client products.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://spider.io/";>spider.io</a>
+</td>
+<td>
+<p>
+At spider.io we've been using Storm as a core part of our classification 
engine since October 2011. We run Storm topologies to combine, analyse and 
classify real-time streams of internet traffic, to identify suspicious or 
undesirable website activity. Over the past 7 months we've expanded our use of 
Storm, so it now manages most of our real-time processing. Our classifications 
are displayed in a custom analytics dashboard, where Storm's distributed remote 
procedure call interface is used to gather data from our database and metadata 
services. DRPC allows us to increase the responsiveness of our user interface 
by distributing processing across a cluster of Amazon EC2 instances.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://8digits.com/";>8digits</a>
+</td>
+<td>
+<p>
+At 8digits, we are using Storm in our analytics engine, which is one of the 
most crucial parts of our infrastructure. We are utilizing several cloud 
servers with multiple cores each for the purpose of running a real-time system 
making several complex calculations. Storm is a proven, solid and a powerful 
framework for most of the big-data problems.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="https://www.alipay.com/";>Alipay</a>
+</td>
+<td>
+<p>
+Alipay is China's leading third-party online payment platform. We are using 
Storm in many scenarios:
+</p>
+
+<ol>
+<li>
+Calculate realtime trade quantity, trade amount, the TOP N seller trading 
information, user register count. More than 100 million messages per day.
+</li>
+<li>
+Log processing, more than 6T data per day.
+</li>
+</ol>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://navisite.com/";>NaviSite</a>
+</td>
+<td>
+<p>
+We are using Storm as part of our server event log monitoring/auditing system. 
 We send log messages from thousands of servers into a RabbitMQ cluster and 
then use Storm to check each message against a set of regular expressions.  If 
there is a match (&lt; 1% of messages), then the message is sent to a bolt that 
stores data in a Mongo database.  Right now we are handling a load of somewhere 
around 5-10k messages per second, however we tested our existing RabbitMQ + 
Storm clusters up to about 50k per second.  We have plans to do real time 
intrusion detection as an enhancement to the current log message reporting 
system. 
+</p>
+
+<p>
+We have Storm deployed on the NaviSite Cloud platform.  We have a ZK cluster 
of 3 small VMs, 1 Nimbus VM and 16 dual core/4GB VMs as supervisors.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.paywithglyph.com";>Glyph</a>
+</td>
+<td>
+<p>
+Glyph is in the business of providing credit card rewards intelligence to 
consumers. At a given point of sale Glyph suggest its users what are the best 
cards to be used at a given merchant location that will provide maximum 
rewards. Glyph also provide suggestion on the cards the user should carry to 
earn maximum rewards based on his personal spending habits. Glyph provides this 
information to the user by retrieving and analyzing credit card transactions 
from banks. Storm is used in Glyph to perform this retrieval and analysis in 
realtime. We are using Memcached in conjuction with Storm for handling 
sessions. We are impressed by how Storm makes high availability and reliability 
of Glyph services possible. We are now using Storm and Clojure in building 
Glyph data analytics and insights services. We have open-sourced node-drpc 
wrapper module for easy Storm DRPC integration with NodeJS.
+</p>
+</td>
+</tr>
+<tr>
+<td>
+<a href="http://heartbyte.com/";>Heartbyte</a>
+</td>
+<td>
+<p>
+At Heartbyte, Storm is a central piece of our realtime audience participation 
platform.  We are often required to process a 'vote' per second from hundreds 
of thousands of mobile devices simultaneously and process / aggregate all of 
the data within a second.  Further, we are finding that Storm is a great 
alternative to other ingest tools for Hadoop/HBase, which we use for batch 
processing after our events conclude.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://2lemetry.com/";>2lemetry</a>
+</td>
+<td>
+<p>
+2lemetry uses Storm to power it's real time analytics on top of the m2m.io 
offering. 2lemetry is partnered with Sprint, Verizon, AT&T, and Arrow 
Electronics to power IoT applications world wide. Some of 2lemetry's larger 
projects include RTX, Kontron, and Intel. 2lemetry also works with many 
professional sporting teams to parse data in real time. 2lemetry receives 
events for every touch of the ball in every MLS soccer match. Storm is used to 
look for trends like passing tendencies as they develop during the game. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.nodeable.com/";>Nodeable</a>
+</td>
+<td>
+<p>
+Nodeable uses Storm to deliver real-time continuous computation of the data we 
consume. Storm has made it significantly easier for us to scale our service 
more efficiently while ensuring the data we deliver is timely and accurate.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://twitsprout.com/";>TwitSprout</a>
+</td>
+<td>
+<p>
+At TwitSprout, we use Storm to analyze activity on Twitter to monitor mentions 
of keywords (mostly client product and brand names) and trigger alerts when 
activity around a certain keyword spikes above normal levels. We also use Storm 
to back the data behind the live-infographics we produce for events sponsored 
by our clients. The infographics are usually in the form of a live dashboard 
that helps measure the audience buzz across social media as it relates to the 
event in realtime.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.happyelements.com/";>HappyElements</a>
+</td>
+<td>
+<p>
+<a href="http://www.happyelements.com";>HappyElements</a> is a leading social 
game developer on Facebook and other SNS platforms. We developed a real time 
data analysis program based on storm to analyze user activity in real time.  
Storm is very easy to use, stable, scalable and maintainable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.idexx.com/view/xhtml/en_us/corporate/home.jsf";>IDEXX 
Laboratories</a>
+</td>
+<td>
+<p>
+IDEXX Laboratories is the leading maker of software and diagnostic instruments 
for the veterinary market. We collect and analyze veterinary medical data from 
thousands of veterinary clinics across the US. We recently embarked on a 
project to upgrade our aging data processing infrastructure that was unable to 
keep up with the rapid increase in the volume, velocity and variety of data 
that we were processing.
+</p>
+
+<p>
+We are utilizing the Storm system to take in the data that is extracted from 
the medical records in a number of different schemas, transform it into a 
standard schema that we created and store it in an Oracle RDBMS database. It is 
basically a souped up distributed ETL system. Storm takes on the plumbing 
necessary for a distributed system and is very easy to write code for. The 
ability to create small pieces of functionality and connect them together gives 
us the ultimate flexibility to parallelize each of the pieces differently.
+</p>
+
+<p>
+Our current cluster consists of four supervisor machines running 110 tasks 
inside 32 worker processes. We run two different topologies which receive 
messages and communicate with each other via RabbitMQ. The whole thing is 
deployed on Amazon Web Services and utilizes S3 for some intermediate storage, 
Redis as a key/value store and Oracle RDS for RDBMS storage. The bolts are all 
written in Java using the Spring framework with Hibernate as an ORM.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.umeng.com/";>Umeng</a>
+</td>
+<td>
+Umeng is the leading and largest provider of mobile app analytics and 
developer services platform in China. Storm powers Umeng's realtime analytics 
platform, processing billions of data points per day and growing. We also use 
Storm in other products which requires realtime processing and it has become 
the core infrastructure in our company. 
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.admaster.com.cn/";>Admaster</a>
+</td>
+<td>
+<p>
+We provide monitoring and precise delivery for Internet advertising. We use 
Storm to do the following:
+</p>
+
+<ol>
+<li>Calculate PV, UV of every advertisement.</li>
+<li>Simple data cleaning: filter out data which format error, filter out 
cheating data (the pv more than certain value)</li>
+</ol>
+Our cluster has 8 nodes, process several billions messages per day, about 
200GB.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://socialmetrix.com/en/";>SocialMetrix</a>
+</td>
+<td>
+<p>
+Since its release, Storm was a perfect fit to our needs of real time 
monitoring. Its powerful API, easy administration and deploy, enabled us to 
rapidly build solutions to monitor presidential elections, several major events 
and currently it is the processing core of our new product "Socialmetrix 
Eventia".
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://needium.com/";>Needium</a>
+</td>
+<td>
+<p>
+At Needium we love Ruby and JRuby. The Storm platform offers the right balance 
between simplicity, flexibility and scalability. We created RedStorm, a Ruby 
DSL for Storm, to keep on using Ruby on top of the power of Storm by leveraging 
Storm's JVM foundation with JRuby. We currently use Storm as our Twitter 
realtime data processing pipeline. We have Storm topologies for content 
filtering, geolocalisation and classification. Storm allows us to architecture 
our pipeline for the Twitter full firehose scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://parse.ly/";>Parse.ly</a>
+</td>
+<td>
+<p>
+Parse.ly is using Storm for its web/content analytics system. We have a 
home-grown data processing and storage system built with Python and Celery, 
with backend stores in Redis and MongoDB. We are now using Storm for real-time 
unique visitor counting and are exploring options for using it for some of our 
richer data sources such as social share data and semantic content metadata.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.parc.com/";>PARC</a>
+</td>
+<td>
+<p>
+High Performance Graph Analytics & Real-time Insights Research team at PARC 
uses Storm as one of the building blocks of their PARC Analytics Cloud 
infrastructure which comprises of Nebula based Openstack, Hadoop, SAP HANA, 
Storm, PARC Graph Analytics, and machine learning toolbox to enable researchers 
to process real-time data feeds from Sensors, web, network, social media, and 
security traces and easily ingest any other real-time data feeds of interest 
for PARC researchers.
+</p>
+<p>
+PARC researchers are working with number of industry collaborators developing 
new tools, algorithms, and models to analyze massive amounts of e-commerce, web 
clickstreams, 3rd party syndicated data, cohort data, social media data 
streams, and structured data from RDBMS, NOSQL, and NEWSQL systems in near 
real-time. PARC  team is developing a reference architecture and benchmarks for 
their near real-time automated insight discovery platform combining the power 
of all above tools and PARCâs applied research in machine learning, graph 
analytics, reasoning, clustering, and contextual recommendations. The High 
Performance Graph Analytics & Real-time Insights research at PARC is headed by 
Surendra Reddy<http://www.linkedin.com/in/skreddy>.  If you are interested to 
learn more about our use/experience of using Storm or to know more about our 
research or to collaborate with PARC in this area, please feel free to contact 
[email protected].
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://gumgum.com/";>GumGum</a>
+</td>
+<td>
+<p>
+GumGum, the leading in-image advertising platform for publishers and brands, 
uses Storm to produce real-time data. Storm and Trident-based topologies 
consume various ad-related events from Kafka and persist the aggregations in 
MySQL and HBase. This architecture will eventually replace most existing daily 
Hadoop map reduce jobs. There are also plans for Kafka + Storm to replace 
existing distributed queue processing infrastructure built with Amazon SQS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.crowdflower.com/";>CrowdFlower</a>
+</td>
+<td>
+<p>
+CrowdFlower is using Storm with Kafka to generalize our data stream
+aggregation and realtime computation infrastructure. We replaced our
+homegrown aggregation solutions with Storm because it simplified the
+creation of fault tolerant systems. We were already using Zookeeper
+and Kafka, so Storm allowed us to build more generic abstractions for
+our analytics using tools that we had already deployed and
+battle-tested in production.
+</p>
+
+<p>
+We are currently writing to DynamoDB from Storm, so we are able to
+scale our capacity quickly by bringing up additional supervisors and
+tweaking the throughput on our Dynamo tables. We look forward to
+exploring other uses for Storm in our system, especially with the
+recent release of Trident.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.dsbox.com";>Digital Sandbox</a>
+</td>
+<td>
+<p>
+At Digital Sandbox we use Storm to enable our open source information feed 
monitoring system.  The system uses Storm to constantly monitor and pull data 
from structured and unstructured information sources across the internet.  For 
each found item, our topology applies natural language processing based concept 
analysis, temporal analysis, geospatial analytics and a prioritization 
algorithm to enable users to monitor large special events, public safety 
operations, and topics of interest to a multitude of individual users and teams.
+</p>
+ 
+<p>
+Our system is built using Storm for feed retrieval and annotation, Python with 
Flask and jQuery for business logic and web interfaces, and MongoDB for data 
persistence. We use NTLK for natural language processing and the WordNet, 
GeoNames, and OpenStreetMap databases to enable feed item concept extraction 
and geolocation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://hallo.me/";>Hallo</a>
+</td>
+<td>
+With several mainstream celebrities and very popular YouTubers using Hallo to 
communicate with their fans, we needed a good solution to notify users via push 
notifications and make sure that the celebrity messages were delivered to 
follower timelines in near realtime. Our initial approach for broadcast push 
notifications would take anywhere from 2-3 hours. After re-engineering our 
solution on top of Storm, that time has been cut down to 5 minutes on a very 
small cluster. With the user base growing and user need for realtime 
communication, we are very happy knowing that we can easily scale Storm by 
adding nodes to maintain a baseline QoS for our users.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://keepcon.com/";>Keepcon</a>
+</td>
+<td>
+We provide moderation services for classifieds, kids communities, newspapers, 
chat rooms, facebook fan pages, youtube channels, reviews, and all kind of UGC. 
We use storm for the integration with our clients, find evidences within each 
text, persisting on cassandra and elastic search and sending results back to 
our clients.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.visiblemeasures.com/";>Visible Measures</a>
+</td>
+<td>
+<p>
+Visible Measures powers video campaigns and analytics for publishers and
+advertisers, tracking data for hundreds of million of videos, and billions
+of views. We are using Storm to process viewing behavior data in real time and 
make
+the information immediately available to our customers. We read events from
+various push and pull sources, including a Kestrel queue, filter and
+enrich the events in Storm topologies, and persist the events to Redis,
+HDFS and Vertica for real-time analytics and archiving. We are currently
+experimenting with Trident topologies, and figuring out how to move more
+of our Hadoop-based batch processing into Storm.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.o2mc.eu/en/";>O2mc</a>
+</td>
+<td>
+<p>
+One of the core products of O2mc is called O2mc Community. O2mc Community 
performs multilingual, realtime sentiment analysis with very low latency and 
distributes the analyzed results to numerous clients. The input is extracted 
from source systems like Twitter, Facebook, e-mail and many more. After the 
analysis has taken place on Storm, the results are streamed to any output 
system ranging from HTTP streaming to clients to direct database insertion to 
an external business process engine to kickstart a process.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.theladders.com";>The Ladders</a>
+</td>
+<td>
+<p>
+TheLadders has been committed to finding the right person for the right job 
since 2003. We're using Storm in a variety of ways and are happy with its 
versatility, robustness, and ease of development. We use Storm in conjunction 
with RabbitMQ for such things as sending hiring alerts: when a recruiter 
submits a job to our site, Storm processes that event and will aggregate 
jobseekers whose profiles match the position. That list is subsequently 
batch-processed to send an email to the list of jobseekers. We also use Storm 
to persist events for Business Intelligence and internal event tracking. We're 
continuing to find uses for Storm where fast, asynchronous, real-time event 
processing is a must.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://semlab.nl";>SemLab</a>
+</td>
+<td>
+<p>
+SemLab develops software for knowledge discovery and information support. Our 
ViewerPro platform uses information extraction, natural language processing and 
semantic web technologies to extract structured data from unstructured sources, 
in domains such as financial news feeds and legal documents. We have 
succesfully adapted ViewerPro's processing framework to run on top of Storm. 
The transition to Storm has made ViewerPro a much more scalable product, 
allowing us to process more in less time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://visualrevenue.com/";>Visual Revenue</a>
+</td>
+<td>
+<p>
+Here at Visual Revenue, we built a decision support system to help online 
editors to make choices on what, when, and where to promote their content in 
real-time. Storm is the backbone our real-time data processing and aggregation 
pipelines.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.peerindex.com/";>PeerIndex</a>
+</td>
+<td>
+<p>
+PeerIndex is working to deliver influence at scale. PeerIndex does this by 
exposing services built on top of our Influence Graph; a directed graph of who 
is influencing whom on the web. PeerIndex gathers data from a number of social 
networks to create the Influence Graph. We use Storm to process our social 
data, to provide real-time aggregations, and to crawl the web, before storing 
our data in a manner most suitable for our Hadoop based systems to batch 
process. Storm provided us with an intuitive API and has slotted in well with 
the rest of our architecture. PeerIndex looks forward to further investing 
resources into our Storm based real-time analytics.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://ants.vn";>ANTS.VN</a>
+</td>
+<td>
+<p>
+Big Data in Advertising is Vietnam's unique platform combines ad serving, a 
real-time bidding (RTB) exchange, Ad Server, Analytics, yield optimization, and 
content valuation to deliver the highest revenue across every desktop, tablet, 
and mobile screen. At ANTS.VN we use Storm to process large amounts of data to 
provide data real time, improve our Ad quality. This platform tracks 
impressions, clicks, conversions, bid requests etc. in real time. Together with 
Kafka, Redis, memcached and Cassandra based messaging, Storm enables us to 
build low-latency fault-tolerant distributed systems with ease.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wayfair.com";>Wayfair</a>
+</td>
+<td>
+<p>
+At Wayfair, we use storm as a platform to drive our core order processing 
pipeline as an event driven system. Storm allows us to reliably process tens of 
thousands of orders daily while providing us the assurance of seamless process 
scalability as our order load increases. Given the projectâs ease of use and 
the immense support of the community, weâve managed to implement our bolts in 
php, construct a simple puppet module for configuration management, and quickly 
solve arising issues. We can now focus most of our development efforts in the 
business layer, check out more information on how we use storm <a 
href="http://engineering.wayfair.com/stormin-oms/";>in our engineering blog</a>. 
</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://innoquant.com/";>InnoQuant</a>
+</td>
+<td>
+<p>
+At InnoQuant, we use Storm as a backbone of our real-time big data analytics 
engine in MOCA platform. MOCA is a next generation, mobile-backend-as-a-service 
platform (MBaaS). It provides brands and app developers with real-time in-app 
tracking, context-aware push messaging, user micro-segmentation based on 
profile, time and geo-context as well as big data analytics. Storm-based 
pipeline is fed with events captured by native mobile SDKs (iOS, Android), 
scales nicely with connected mobile app users, delivers stream-based metrics 
and aggregations, and finally integrates with the rest of MOCA infrastructure, 
including columnar storage (Cassandra) and graph storage (Titan).
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.fliptop.com/";>Fliptop</a>
+</td>
+<td>
+<p>
+Fliptop is a customer intelligence platform which allows customers to 
integrating their contacts, and campaign data, to enhance their prospect with 
social identities, and to find their best leads, and most influential 
customers. We have been using Storm for various tasks which requires 
scalability and reliability, including integrating with sales/marketing 
platform, data appending for contacts/leads, and computing scoring of 
contacts/leads. It's one of our most robust and scalable infrastructure.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.trovit.com/";>Trovit</a>
+</td>
+<td>
+<p>
+Trovit is a search engine for classified ads present in 39 countries and 
different business categories (Real Estate, Cars, Jobs, Rentals, Products and 
Deals). Currently we use Storm to process and index ads in a distributed and 
low latency fashion. Combined with other technologies like Hadoop, Hbase and 
Solr has allowed us to build a scalable and low latency platform to serve 
search results to the end user.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.openx.com/";>OpenX</a>
+</td>
+<td>
+<p>
+OpenX is a unique platform combines ad serving, a real-time bidding (RTB) 
exchange, yield optimization, and content valuation to deliver the highest 
revenue across every desktop, tablet, and mobile screen
+At OpenX we use Storm to process large amounts of data to provide real time 
Analytics. Storm provides us to process data real time to improve our Ad 
quality.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://keen.io/";>Keen IO</a>
+</td>
+<td>
+<p>
+Keen IO is an analytics backend-as-a-service. The Keen IO API makes it easy 
for customers to do internal analytics or expose analytics features to their 
customers. Keen IO uses Storm (DRPC) to query billion-event data sets at very 
low latencies. We also use Storm to control our ingestion pipeline, sourcing 
data from Kafka and storing it in Cassandra.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://liveperson.com/";>LivePerson</a>
+</td>
+<td>
+<p>
+LivePerson is a provider of Interaction-Service over the web. Interaction 
between an agent and a visitor in site can be achieved using phone call, chat, 
banners, etc.Using Storm, LivePerson can collect and process visitor data and 
provide information in real time to the agents about the visitor behavior. 
Moreover, LivePerson gets to better decisions about how to react to visitors in 
a way that best addresses their needs.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://yieldbot.com/";>YieldBot</a>
+</td>
+<td>
+<p>
+Yieldbot connects ads to the real-time consumer intent streaming within 
premium publishers. To do this, Yieldbot leverages Storm for a wide variety of 
real-time processing tasks. We've open sourced our clojure DSL for writing 
trident topologies, marceline, which we use extensively. Events are read from 
Kafka, most state is stored in Cassandra, and we heavily use Storm's DRPC 
features. Our Storm use cases range from HTML processing, to hotness-style 
trending, to probabilistic rankings and cardinalities. Storm topologies touch 
virtually all of the events generated by the Yieldbot platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://equinix.com/";>Equinix</a>
+</td>
+<td>
+<p>
+At Equinix, we use a number of Storm topologies to process and persist various 
data streams generated by sensors in our data centers. We also use Storm for 
real-time monitoring of different infrastructure components. Other few 
topologies are used for processing logs in real-time for internal IT systems  
which also provide insights in user behavior.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://minewhat.com/";>MineWhat</a>
+</td>
+<td>
+<p>
+MineWhat provides actionable analytics for ecommerce spanning every SKU,brand 
and category in the store. We use Storm to process raw click stream ingestion 
from Kafka and compute live analytics. Storm topologies powers our complex 
product to user interaction analysis. Multi language feature in storm is really 
kick-ass, we have bolts written in Node.js, Python and Ruby. Storm has been in 
our production site since Nov 2012.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.360.cn/";>Qihoo 360</a>
+</td>
+<td>
+<p>
+360 have deployed about 50 realtime applications on top of storm including web 
page analysis, log processing, image processing, voice processing, etc.
+</p>
+<p>
+The use case of storm at 360 is a bit special since we deployed storm on 
thounds of servers which are not dedicated for storm. Storm just use little 
cpu/memory/network resource on each server. However theses storm clusters 
leverage idle resources of servers at nearly zero cost to provide great 
computing power and it's realtime. It's amazing.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.holidaycheck.com/";>HolidayCheck</a>
+</td>
+<td>
+<p>
+HolidayCheck is an online travel site and agency available in 10
+languages worldwide visited by 30 million people a month.
+We use Storm to deliver real-time hotel and holiday package offers
+from multiple providers - reservation systems and affiliate travel
+networks - in a low latency fashion based on user-selected criteria.
+In further reservation steps we use DRPC for vacancy checks and
+bookings of chosen offers. Along with Storm in the system for offers
+delivery we use Scala, Akka, Hazelcast, Drools and MongoDB. Real-time
+offer stream is delivered outside of the system back to the front-end
+via websocket connections.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://dataminelab.com/";>DataMine Lab</a>
+</td>
+<td>
+<p>
+DataMine Lab is a consulting company integrating Storm into its
+portfolio of technologies. Storm powers range of our customers'
+systems allowing us to build real time analytics on tens of millions
+of visitors to the advertising platforms we helped to create. Together
+with Redis, Cassandra and Hadoop, Storm allows us to provide real-time
+distributed data platform at a global scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wizecommerce.com/";>Wize Commerce</a>
+</td>
+<td>
+<p>
+Wize CommerceÂ® is the smartest way to grow your digital business. For over 
ten years, we have been helping clients maximize their revenue and traffic 
using optimization technologies that operate at massive scale, and across 
digital ecosystems. We own and operate leading comparison shopping engines 
including NextagÂ®, PriceMachineTM, and <a 
href="http://guenstiger.de";>guenstiger.de</a>, and provide services to a wide 
ecosystem of partner sites that use our e-commerce platform. These sites 
together drive over $1B in annual merchant sales.
+</p>
+<p>
+We use storm to power our core platform infrastructure and it has become a 
vital component of our search indexing system & Cassandra storage. Along with 
KAFKA, STORM has reduced our end-to-end latencies from several hours to few 
minutes, and being largest comparison shopping sites operator, pushing price 
updates to the live site is very important and storm helps a lot achieve the 
same. We are extensively using storm in production since Q1 2013.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://metamarkets.com";>Metamarkets</a>
+</td>
+<td>
+<p>At Metamarkets, Apache Storm is used to process real-time event data 
streamed from Apache Kafka message brokers, and then to load that data into a 
<a href="http://druid.io";>Druid cluster</a>, the low-latency data store at the 
heart of our real-time analytics service. Our Storm topologies perform various 
operations, ranging from simple filtering of "outdated" events, to 
transformations such as ID-to-name lookups, to complex multi-stream joins. 
Since our service is intended to respond to ad-hoc queries within seconds of 
ingesting events, the speed, flexibility, and robustness of those topologies 
make Storm a key piece of our real-time stack.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mightytravels.com";>Mighty Travels</a>
+</td>
+<td>
+<p>We are using Storm to process real-time search data stream and
+application logs. The part we like best about Storm is the ease of
+scaling up basically just by throwing more machines at it.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.polecat.co";>Polecat</a>
+</td>
+<td>
+<p>Polecat's digital analyisis platform, MeaningMine, allows users to search 
all on-line news, blogs and social media in real-time and run bespoke analysis 
in order to inform corporate strategy and decision making for some of the world 
largest companies and governmental organisations.</p>
+<p>
+Polecat uses Storm to run an application we've called the 'Data Munger'.  We 
run many different topologies on a multi host storm cluster to process tens of 
millions of online articles and posts that we collect each day.  Storm handles 
our analysis of these documents so that we can provide insight on realtime data 
to our clients.  We output our results from Storm into one of many large Apache 
Solr clusters for our end user applications to query (Polecat is also a 
contributor to Solr).  We first starting developing our app to run on storm 
back in June 2012 and it has been live since roughly September 2012.  We've 
found Storm to be an excellent fit for our needs here, and we've always found 
it extremely robust and fast.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://www.skylight.io/";>Skylight by Tilde</a>
+</td>
+<td>
+<p>Skylight is a production profiler for Ruby on Rails apps that focuses on 
providing detailed information about your running application that you can 
explore in an intuitive way. We use Storm to process traces from our agent into 
data structures that we can slice and dice for you in our web app.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ad4game.com/";>Ad4Game</a>
+</td>
+<td>
+<p>We are an advertising network and we use Storm to calculate priorities in 
real time to know which ads to show for which website, visitor and country.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.impetus.com/";>Impetus Technologies</a>
+</td>
+<td>
+<p>StreamAnalytix, a product of Impetus Technologies enables enterprises to 
analyze and respond to events in real-time at Big Data scale. Based on Apache 
Storm, StreamAnalytix is designed to rapidly build and deploy streaming 
analytics applications for any industry vertical, any data format, and any use 
case. This high-performance scalable platform comes with a pre-integrated 
package of components like Cassandra, Storm, Kafka and more. In addition, it 
also brings together the proven open source technology stack with Hadoop and 
NoSQL to provide massive scalability, dynamic data pipelines, and a visual 
designer for rapid application development.</p>
+<p>
+Through StreamAnalytix, the users can ingest, store and analyze millions of 
events per second and discover exceptions, patterns, and trends through live 
dashboards. It also provides seamless integration with indexing store 
(ElasticSearch) and NoSQL database (HBase, Cassandra, and Oracle NoSQL) for 
writing data in real-time. With the use of Storm, the product delivers high 
business value solutions such as log analytics, streaming ETL, deep social 
listening, Real-time marketing, business process acceleration and predictive 
maintenance.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.akazoo.com/en";>Akazoo</a>
+</td>
+<td>
+<p>
+Akazoo is a platform providing music streaming services.  Storm is the 
backbone of all our real-time analytical processing. We use it for tracking and 
analyzing application events and for various other stuff, including 
recommendations and parallel task execution.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mapillary.com";>Mapillary</a>
+</td>
+<td>
+<p>
+At Mapillary we use storm for a wide variety of tasks. Having a system which 
is 100% based on kafka input storm and trident makes reasoning about our data a 
breeze.  
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.gutscheinrausch.de/";>Gutscheinrausch.de</a>
+</td>
+<td>
+<p>
+We recently upgraded our existing IT infrastructure, using Storm as one of our 
main tools.
+Each day we collect sales, clicks, visits and various ecommerce metrics from 
various different systems (webpages, affiliate reportings, networks, 
tracking-scripts etc). We process this continually generated data using Storm 
before entering it into the backend systems for further use.
+</p>
+<p>
+Using Storm we were able to decouple our heterogeneous frontend-systems from 
our backends and take load off the data warehouse applications by inputting 
pre-processed data. This way we can easy collect and process all data and then 
do realtime OLAP queries using our propietary data warehouse technology.
+</p>
+<p>
+We are mostly impressed by the high speed, low maintenance approach Storm has 
provided us with. Also being able to easily scale up the system using more 
machines is a big plus. Since we're a small team it allows us to focus more on 
our core business instead of the underlying technology. You could say it has 
taken our hearts by storm!
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.appriver.com";>AppRiver</a>
+</td>
+<td>
+<p>
+We are using Storm to track internet threats from varied sources around the 
web.  It is always fast and reliable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mercadolibre.com/";>MercadoLibre</a>
+</td>
+<td>
+</td>
+</tr>
+
+
+</table>

Modified: storm/branches/bobby-versioned-site/getting-help.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/getting-help.md?rev=1735516&r1=1735515&r2=1735516&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/getting-help.md (original)
+++ storm/branches/bobby-versioned-site/getting-help.md Thu Mar 17 22:48:32 2016
@@ -1,7 +1,8 @@
 ---
 layout: default
-title: Getting help
+title: Documentation
 ---
+### Getting help
 
 __NOTE:__ The google groups account [email protected] is now 
officially deprecated in favor of the Apache-hosted user/dev mailing lists.
 

Modified: storm/branches/bobby-versioned-site/index.html
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/index.html?rev=1735516&r1=1735515&r2=1735516&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/index.html (original)
+++ storm/branches/bobby-versioned-site/index.html Thu Mar 17 22:48:32 2016
@@ -90,7 +90,7 @@ title: Apache Storm
                     <a href="http://www.wego.com/";><img 
src="images/logos/wego.jpg" class="img-responsive"></a>
                 </div>
                 <div>
-                  <a href="/documentation/Powered-By.html" target="blank" 
class="pull-right" style="font-size: 18px;">and many others</a>
+                  <a href="/Powered-By.html" target="blank" class="pull-right" 
style="font-size: 18px;">and many others</a>
                 </div>
             </div>
         </div>

Added: storm/branches/bobby-versioned-site/releases/0.10.0/flux.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/flux.md?rev=1735516&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/flux.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/flux.md Thu Mar 17 
22:48:32 2016
@@ -0,0 +1,836 @@
+---
+title: Flux
+layout: documentation
+documentation: true
+version: v0.10.0
+---
+
+A framework for creating and deploying Apache Storm streaming computations 
with less friction.
+
+## Definition
+**flux** |flÉks| _noun_
+
+1. The action or process of flowing or flowing out
+2. Continuous change
+3. In physics, the rate of flow of a fluid, radiant energy, or particles 
across a given area
+4. A substance mixed with a solid to lower its melting point
+
+## Rationale
+Bad things happen when configuration is hard-coded. No one should have to 
recompile or repackage an application in
+order to change configuration.
+
+## About
+Flux is a framework and set of utilities that make defining and deploying 
Apache Storm topologies less painful and
+deveoper-intensive.
+
+Have you ever found yourself repeating this pattern?:
+
+```java
+
+public static void main(String[] args) throws Exception {
+    // logic to determine if we're running locally or not...
+    // create necessary config options...
+    boolean runLocal = shouldRunLocal();
+    if(runLocal){
+        LocalCluster cluster = new LocalCluster();
+        cluster.submitTopology(name, conf, topology);
+    } else {
+        StormSubmitter.submitTopology(name, conf, topology);
+    }
+}
+```
+
+Wouldn't something like this be easier:
+
+```bash
+storm jar mytopology.jar org.apache.storm.flux.Flux --local config.yaml
+```
+
+or:
+
+```bash
+storm jar mytopology.jar org.apache.storm.flux.Flux --remote config.yaml
+```
+
+Another pain point often mentioned is the fact that the wiring for a Topology 
graph is often tied up in Java code,
+and that any changes require recompilation and repackaging of the topology jar 
file. Flux aims to alleviate that
+pain by allowing you to package all your Storm components in a single jar, and 
use an external text file to define
+the layout and configuration of your topologies.
+
+## Features
+
+ * Easily configure and deploy Storm topologies (Both Storm core and 
Microbatch API) without embedding configuration
+   in your topology code
+ * Support for existing topology code (see below)
+ * Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL
+ * YAML DSL support for most Storm components (storm-kafka, storm-hdfs, 
storm-hbase, etc.)
+ * Convenient support for multi-lang components
+ * External property substitution/filtering for easily switching between 
configurations/environments (similar to Maven-style
+   `${variable.name}` substitution)
+
+## Usage
+
+To use Flux, add it as a dependency and package all your Storm components in a 
fat jar, then create a YAML document
+to define your topology (see below for YAML configuration options).
+
+### Building from Source
+The easiest way to use Flux, is to add it as a Maven dependency in you project 
as described below.
+
+If you would like to build Flux from source and run the unit/integration 
tests, you will need the following installed
+on your system:
+
+* Python 2.6.x or later
+* Node.js 0.10.x or later
+
+#### Building with unit tests enabled:
+
+```
+mvn clean install
+```
+
+#### Building with unit tests disabled:
+If you would like to build Flux without installing Python or Node.js you can 
simply skip the unit tests:
+
+```
+mvn clean install -DskipTests=true
+```
+
+Note that if you plan on using Flux to deploy topologies to a remote cluster, 
you will still need to have Python
+installed since it is required by Apache Storm.
+
+
+#### Building with integration tests enabled:
+
+```
+mvn clean install -DskipIntegration=false
+```
+
+
+### Packaging with Maven
+To enable Flux for your Storm components, you need to add it as a dependency 
such that it's included in the Storm
+topology jar. This can be accomplished with the Maven shade plugin (preferred) 
or the Maven assembly plugin (not
+recommended).
+
+#### Flux Maven Dependency
+The current version of Flux is available in Maven Central at the following 
coordinates:
+```xml
+<dependency>
+    <groupId>org.apache.storm</groupId>
+    <artifactId>flux-core</artifactId>
+    <version>${storm.version}</version>
+</dependency>
+```
+
+#### Creating a Flux-Enabled Topology JAR
+The example below illustrates Flux usage with the Maven shade plugin:
+
+ ```xml
+<!-- include Flux and user dependencies in the shaded jar -->
+<dependencies>
+    <!-- Flux include -->
+    <dependency>
+        <groupId>org.apache.storm</groupId>
+        <artifactId>flux-core</artifactId>
+        <version>${storm.version}</version>
+    </dependency>
+
+    <!-- add user dependencies here... -->
+
+</dependencies>
+<!-- create a fat jar that includes all dependencies -->
+<build>
+    <plugins>
+        <plugin>
+            <groupId>org.apache.maven.plugins</groupId>
+            <artifactId>maven-shade-plugin</artifactId>
+            <version>1.4</version>
+            <configuration>
+                <createDependencyReducedPom>true</createDependencyReducedPom>
+            </configuration>
+            <executions>
+                <execution>
+                    <phase>package</phase>
+                    <goals>
+                        <goal>shade</goal>
+                    </goals>
+                    <configuration>
+                        <transformers>
+                            <transformer
+                                    
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
+                            <transformer
+                                    
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
+                                
<mainClass>org.apache.storm.flux.Flux</mainClass>
+                            </transformer>
+                        </transformers>
+                    </configuration>
+                </execution>
+            </executions>
+        </plugin>
+    </plugins>
+</build>
+ ```
+
+### Deploying and Running a Flux Topology
+Once your topology components are packaged with the Flux dependency, you can 
run different topologies either locally
+or remotely using the `storm jar` command. For example, if your fat jar is 
named `myTopology-0.1.0-SNAPSHOT.jar` you
+could run it locally with the command:
+
+
+```bash
+storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local 
my_config.yaml
+
+```
+
+### Command line options
+```
+usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux
+             [options] <topology-config.yaml>
+ -d,--dry-run                 Do not run or deploy the topology. Just
+                              build, validate, and print information about
+                              the topology.
+ -e,--env-filter              Perform environment variable substitution.
+                              Replace keys identified with `${ENV-[NAME]}`
+                              will be replaced with the corresponding
+                              `NAME` environment value
+ -f,--filter <file>           Perform property substitution. Use the
+                              specified file as a source of properties,
+                              and replace keys identified with {$[property
+                              name]} with the value defined in the
+                              properties file.
+ -i,--inactive                Deploy the topology, but do not activate it.
+ -l,--local                   Run the topology in local mode.
+ -n,--no-splash               Suppress the printing of the splash screen.
+ -q,--no-detail               Suppress the printing of topology details.
+ -r,--remote                  Deploy the topology to a remote cluster.
+ -R,--resource                Treat the supplied path as a classpath
+                              resource instead of a file.
+ -s,--sleep <ms>              When running locally, the amount of time to
+                              sleep (in ms.) before killing the topology
+                              and shutting down the local cluster.
+ -z,--zookeeper <host:port>   When running in local mode, use the
+                              ZooKeeper at the specified <host>:<port>
+                              instead of the in-process ZooKeeper.
+                              (requires Storm 0.9.3 or later)
+```
+
+**NOTE:** Flux tries to avoid command line switch collision with the `storm` 
command, and allows any other command line
+switches to pass through to the `storm` command.
+
+For example, you can use the `storm` command switch `-c` to override a 
topology configuration property. The following
+example command will run Flux and override the `nimbus.seeds` configuration:
+
+```bash
+storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote 
my_config.yaml -c 'nimbus.seeds=["localhost"]'
+```
+
+### Sample output
+```
+âââââââââââ     âââ   ââââââ  âââ
+âââââââââââ     âââ   
âââââââââââ
+ââââââ  âââ     âââ   âââ ââââââ
+ââââââ  âââ     âââ   âââ ââââââ
+âââ     âââââââââââââââââââââ 
âââ
+âââ     ââââââââ âââââââ âââ  
âââ
++-         Apache Storm        -+
++-  data FLow User eXperience  -+
+Version: 0.3.0
+Parsing file: /Users/hsimpson/Projects/donut_domination/storm/shell_test.yaml
+---------- TOPOLOGY DETAILS ----------
+Name: shell-topology
+--------------- SPOUTS ---------------
+sentence-spout[1](org.apache.storm.flux.spouts.GenericShellSpout)
+---------------- BOLTS ---------------
+splitsentence[1](org.apache.storm.flux.bolts.GenericShellBolt)
+log[1](org.apache.storm.flux.wrappers.bolts.LogInfoBolt)
+count[1](backtype.storm.testing.TestWordCounter)
+--------------- STREAMS ---------------
+sentence-spout --SHUFFLE--> splitsentence
+splitsentence --FIELDS--> count
+count --SHUFFLE--> log
+--------------------------------------
+Submitting topology: 'shell-topology' to remote cluster...
+```
+
+## YAML Configuration
+Flux topologies are defined in a YAML file that describes a topology. A Flux 
topology
+definition consists of the following:
+
+  1. A topology name
+  2. A list of topology "components" (named Java objects that will be made 
available in the environment)
+  3. **EITHER** (A DSL topology definition):
+      * A list of spouts, each identified by a unique ID
+      * A list of bolts, each identified by a unique ID
+      * A list of "stream" objects representing a flow of tuples between 
spouts and bolts
+  4. **OR** (A JVM class that can produce a 
`backtype.storm.generated.StormTopology` instance:
+      * A `topologySource` definition.
+
+
+
+For example, here is a simple definition of a wordcount topology using the 
YAML DSL:
+
+```yaml
+name: "yaml-topology"
+config:
+  topology.workers: 1
+
+# spout definitions
+spouts:
+  - id: "spout-1"
+    className: "backtype.storm.testing.TestWordSpout"
+    parallelism: 1
+
+# bolt definitions
+bolts:
+  - id: "bolt-1"
+    className: "backtype.storm.testing.TestWordCounter"
+    parallelism: 1
+  - id: "bolt-2"
+    className: "org.apache.storm.flux.wrappers.bolts.LogInfoBolt"
+    parallelism: 1
+
+#stream definitions
+streams:
+  - name: "spout-1 --> bolt-1" # name isn't used (placeholder for logging, UI, 
etc.)
+    from: "spout-1"
+    to: "bolt-1"
+    grouping:
+      type: FIELDS
+      args: ["word"]
+
+  - name: "bolt-1 --> bolt2"
+    from: "bolt-1"
+    to: "bolt-2"
+    grouping:
+      type: SHUFFLE
+
+
+```
+## Property Substitution/Filtering
+It's common for developers to want to easily switch between configurations, 
for example switching deployment between
+a development environment and a production environment. This can be 
accomplished by using separate YAML configuration
+files, but that approach would lead to unnecessary duplication, especially in 
situations where the Storm topology
+does not change, but configuration settings such as host names, ports, and 
parallelism paramters do.
+
+For this case, Flux offers properties filtering to allow you two externalize 
values to a `.properties` file and have
+them substituted before the `.yaml` file is parsed.
+
+To enable property filtering, use the `--filter` command line option and 
specify a `.properties` file. For example,
+if you invoked flux like so:
+
+```bash
+storm jar myTopology-0.1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local 
my_config.yaml --filter dev.properties
+```
+With the following `dev.properties` file:
+
+```properties
+kafka.zookeeper.hosts: localhost:2181
+```
+
+You would then be able to reference those properties by key in your `.yaml` 
file using `${}` syntax:
+
+```yaml
+  - id: "zkHosts"
+    className: "storm.kafka.ZkHosts"
+    constructorArgs:
+      - "${kafka.zookeeper.hosts}"
+```
+
+In this case, Flux would replace `${kafka.zookeeper.hosts}` with 
`localhost:2181` before parsing the YAML contents.
+
+### Environment Variable Substitution/Filtering
+Flux also allows environment variable substitution. For example, if an 
environment variable named `ZK_HOSTS` if defined,
+you can reference it in a Flux YAML file with the following syntax:
+
+```
+${ENV-ZK_HOSTS}
+```
+
+## Components
+Components are essentially named object instances that are made available as 
configuration options for spouts and
+bolts. If you are familiar with the Spring framework, components are roughly 
analagous to Spring beans.
+
+Every component is identified, at a minimum, by a unique identifier (String) 
and a class name (String). For example,
+the following will make an instance of the `storm.kafka.StringScheme` class 
available as a reference under the key
+`"stringScheme"` . This assumes the `storm.kafka.StringScheme` has a default 
constructor.
+
+```yaml
+components:
+  - id: "stringScheme"
+    className: "storm.kafka.StringScheme"
+```
+
+### Contructor Arguments, References, Properties and Configuration Methods
+
+####Constructor Arguments
+Arguments to a class constructor can be configured by adding a 
`contructorArgs` element to a components.
+`constructorArgs` is a list of objects that will be passed to the class' 
constructor. The following example creates an
+object by calling the constructor that takes a single string as an argument:
+
+```yaml
+  - id: "zkHosts"
+    className: "storm.kafka.ZkHosts"
+    constructorArgs:
+      - "localhost:2181"
+```
+
+####References
+Each component instance is identified by a unique id that allows it to be 
used/reused by other components. To
+reference an existing component, you specify the id of the component with the 
`ref` tag.
+
+In the following example, a component with the id `"stringScheme"` is created, 
and later referenced, as a an argument
+to another component's constructor:
+
+```yaml
+components:
+  - id: "stringScheme"
+    className: "storm.kafka.StringScheme"
+
+  - id: "stringMultiScheme"
+    className: "backtype.storm.spout.SchemeAsMultiScheme"
+    constructorArgs:
+      - ref: "stringScheme" # component with id "stringScheme" must be 
declared above.
+```
+**N.B.:** References can only be used after (below) the object they point to 
has been declared.
+
+####Properties
+In addition to calling constructors with different arguments, Flux also allows 
you to configure components using
+JavaBean-like setter methods and fields declared as `public`:
+
+```yaml
+  - id: "spoutConfig"
+    className: "storm.kafka.SpoutConfig"
+    constructorArgs:
+      # brokerHosts
+      - ref: "zkHosts"
+      # topic
+      - "myKafkaTopic"
+      # zkRoot
+      - "/kafkaSpout"
+      # id
+      - "myId"
+    properties:
+      - name: "forceFromStart"
+        value: true
+      - name: "scheme"
+        ref: "stringMultiScheme"
+```
+
+In the example above, the `properties` declaration will cause Flux to look for 
a public method in the `SpoutConfig` with
+the signature `setForceFromStart(boolean b)` and attempt to invoke it. If a 
setter method is not found, Flux will then
+look for a public instance variable with the name `forceFromStart` and attempt 
to set its value.
+
+References may also be used as property values.
+
+####Configuration Methods
+Conceptually, configuration methods are similar to Properties and Constructor 
Args -- they allow you to invoke an
+arbitrary method on an object after it is constructed. Configuration methods 
are useful for working with classes that
+don't expose JavaBean methods or have constructors that can fully configure 
the object. Common examples include classes
+that use the builder pattern for configuration/composition.
+
+The following YAML example creates a bolt and configures it by calling several 
methods:
+
+```yaml
+bolts:
+  - id: "bolt-1"
+    className: "org.apache.storm.flux.test.TestBolt"
+    parallelism: 1
+    configMethods:
+      - name: "withFoo"
+        args:
+          - "foo"
+      - name: "withBar"
+        args:
+          - "bar"
+      - name: "withFooBar"
+        args:
+          - "foo"
+          - "bar"
+```
+
+The signatures of the corresponding methods are as follows:
+
+```java
+    public void withFoo(String foo);
+    public void withBar(String bar);
+    public void withFooBar(String foo, String bar);
+```
+
+Arguments passed to configuration methods work much the same way as 
constructor arguments, and support references as
+well.
+
+### Using Java `enum`s in Contructor Arguments, References, Properties and 
Configuration Methods
+You can easily use Java `enum` values as arguments in a Flux YAML file, simply 
by referencing the name of the `enum`.
+
+For example, [Storm's HDFS module]() includes the following `enum` definition 
(simplified for brevity):
+
+```java
+public static enum Units {
+    KB, MB, GB, TB
+}
+```
+
+And the `org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy` class has 
the following constructor:
+
+```java
+public FileSizeRotationPolicy(float count, Units units)
+
+```
+The following Flux `component` definition could be used to call the 
constructor:
+
+```yaml
+  - id: "rotationPolicy"
+    className: "org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy"
+    constructorArgs:
+      - 5.0
+      - MB
+```
+
+The above definition is functionally equivalent to the following Java code:
+
+```java
+// rotate files when they reach 5MB
+FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB);
+```
+
+## Topology Config
+The `config` section is simply a map of Storm topology configuration 
parameters that will be passed to the
+`backtype.storm.StormSubmitter` as an instance of the `backtype.storm.Config` 
class:
+
+```yaml
+config:
+  topology.workers: 4
+  topology.max.spout.pending: 1000
+  topology.message.timeout.secs: 30
+```
+
+# Existing Topologies
+If you have existing Storm topologies, you can still use Flux to 
deploy/run/test them. This feature allows you to
+leverage Flux Constructor Arguments, References, Properties, and Topology 
Config declarations for existing topology
+classes.
+
+The easiest way to use an existing topology class is to define
+a `getTopology()` instance method with one of the following signatures:
+
+```java
+public StormTopology getTopology(Map<String, Object> config)
+```
+or:
+
+```java
+public StormTopology getTopology(Config config)
+```
+
+You could then use the following YAML to configure your topology:
+
+```yaml
+name: "existing-topology"
+topologySource:
+  className: "org.apache.storm.flux.test.SimpleTopology"
+```
+
+If the class you would like to use as a topology source has a different method 
name (i.e. not `getTopology`), you can
+override it:
+
+```yaml
+name: "existing-topology"
+topologySource:
+  className: "org.apache.storm.flux.test.SimpleTopology"
+  methodName: "getTopologyWithDifferentMethodName"
+```
+
+__N.B.:__ The specified method must accept a single argument of type 
`java.util.Map<String, Object>` or
+`backtype.storm.Config`, and return a `backtype.storm.generated.StormTopology` 
object.
+
+# YAML DSL
+## Spouts and Bolts
+Spout and Bolts are configured in their own respective section of the YAML 
configuration. Spout and Bolt definitions
+are extensions to the `component` definition that add a `parallelism` 
parameter that sets the parallelism  for a
+component when the topology is deployed.
+
+Because spout and bolt definitions extend `component` they support constructor 
arguments, references, and properties as
+well.
+
+Shell spout example:
+
+```yaml
+spouts:
+  - id: "sentence-spout"
+    className: "org.apache.storm.flux.spouts.GenericShellSpout"
+    # shell spout constructor takes 2 arguments: String[], String[]
+    constructorArgs:
+      # command line
+      - ["node", "randomsentence.js"]
+      # output fields
+      - ["word"]
+    parallelism: 1
+```
+
+Kafka spout example:
+
+```yaml
+components:
+  - id: "stringScheme"
+    className: "storm.kafka.StringScheme"
+
+  - id: "stringMultiScheme"
+    className: "backtype.storm.spout.SchemeAsMultiScheme"
+    constructorArgs:
+      - ref: "stringScheme"
+
+  - id: "zkHosts"
+    className: "storm.kafka.ZkHosts"
+    constructorArgs:
+      - "localhost:2181"
+
+# Alternative kafka config
+#  - id: "kafkaConfig"
+#    className: "storm.kafka.KafkaConfig"
+#    constructorArgs:
+#      # brokerHosts
+#      - ref: "zkHosts"
+#      # topic
+#      - "myKafkaTopic"
+#      # clientId (optional)
+#      - "myKafkaClientId"
+
+  - id: "spoutConfig"
+    className: "storm.kafka.SpoutConfig"
+    constructorArgs:
+      # brokerHosts
+      - ref: "zkHosts"
+      # topic
+      - "myKafkaTopic"
+      # zkRoot
+      - "/kafkaSpout"
+      # id
+      - "myId"
+    properties:
+      - name: "forceFromStart"
+        value: true
+      - name: "scheme"
+        ref: "stringMultiScheme"
+
+config:
+  topology.workers: 1
+
+# spout definitions
+spouts:
+  - id: "kafka-spout"
+    className: "storm.kafka.KafkaSpout"
+    constructorArgs:
+      - ref: "spoutConfig"
+
+```
+
+Bolt Examples:
+
+```yaml
+# bolt definitions
+bolts:
+  - id: "splitsentence"
+    className: "org.apache.storm.flux.bolts.GenericShellBolt"
+    constructorArgs:
+      # command line
+      - ["python", "splitsentence.py"]
+      # output fields
+      - ["word"]
+    parallelism: 1
+    # ...
+
+  - id: "log"
+    className: "org.apache.storm.flux.wrappers.bolts.LogInfoBolt"
+    parallelism: 1
+    # ...
+
+  - id: "count"
+    className: "backtype.storm.testing.TestWordCounter"
+    parallelism: 1
+    # ...
+```
+## Streams and Stream Groupings
+Streams in Flux are represented as a list of connections (Graph edges, data 
flow, etc.) between the Spouts and Bolts in
+a topology, with an associated Grouping definition.
+
+A Stream definition has the following properties:
+
+**`name`:** A name for the connection (optional, currently unused)
+
+**`from`:** The `id` of a Spout or Bolt that is the source (publisher)
+
+**`to`:** The `id` of a Spout or Bolt that is the destination (subscriber)
+
+**`grouping`:** The stream grouping definition for the Stream
+
+A Grouping definition has the following properties:
+
+**`type`:** The type of grouping. One of 
`ALL`,`CUSTOM`,`DIRECT`,`SHUFFLE`,`LOCAL_OR_SHUFFLE`,`FIELDS`,`GLOBAL`, or 
`NONE`.
+
+**`streamId`:** The Storm stream ID (Optional. If unspecified will use the 
default stream)
+
+**`args`:** For the `FIELDS` grouping, a list of field names.
+
+**`customClass`** For the `CUSTOM` grouping, a definition of custom grouping 
class instance
+
+The `streams` definition example below sets up a topology with the following 
wiring:
+
+```
+    kafka-spout --> splitsentence --> count --> log
+```
+
+
+```yaml
+#stream definitions
+# stream definitions define connections between spouts and bolts.
+# note that such connections can be cyclical
+# custom stream groupings are also supported
+
+streams:
+  - name: "kafka --> split" # name isn't used (placeholder for logging, UI, 
etc.)
+    from: "kafka-spout"
+    to: "splitsentence"
+    grouping:
+      type: SHUFFLE
+
+  - name: "split --> count"
+    from: "splitsentence"
+    to: "count"
+    grouping:
+      type: FIELDS
+      args: ["word"]
+
+  - name: "count --> log"
+    from: "count"
+    to: "log"
+    grouping:
+      type: SHUFFLE
+```
+
+### Custom Stream Groupings
+Custom stream groupings are defined by setting the grouping type to `CUSTOM` 
and defining a `customClass` parameter
+that tells Flux how to instantiate the custom class. The `customClass` 
definition extends `component`, so it supports
+constructor arguments, references, and properties as well.
+
+The example below creates a Stream with an instance of the 
`backtype.storm.testing.NGrouping` custom stream grouping
+class.
+
+```yaml
+  - name: "bolt-1 --> bolt2"
+    from: "bolt-1"
+    to: "bolt-2"
+    grouping:
+      type: CUSTOM
+      customClass:
+        className: "backtype.storm.testing.NGrouping"
+        constructorArgs:
+          - 1
+```
+
+## Includes and Overrides
+Flux allows you to include the contents of other YAML files, and have them 
treated as though they were defined in the
+same file. Includes may be either files, or classpath resources.
+
+Includes are specified as a list of maps:
+
+```yaml
+includes:
+  - resource: false
+    file: "src/test/resources/configs/shell_test.yaml"
+    override: false
+```
+
+If the `resource` property is set to `true`, the include will be loaded as a 
classpath resource from the value of the
+`file` attribute, otherwise it will be treated as a regular file.
+
+The `override` property controls how includes affect the values defined in the 
current file. If `override` is set to
+`true`, values in the included file will replace values in the current file 
being parsed. If `override` is set to
+`false`, values in the current file being parsed will take precedence, and the 
parser will refuse to replace them.
+
+**N.B.:** Includes are not yet recursive. Includes from included files will be 
ignored.
+
+
+## Basic Word Count Example
+
+This example uses a spout implemented in JavaScript, a bolt implemented in 
Python, and a bolt implemented in Java
+
+Topology YAML config:
+
+```yaml
+---
+name: "shell-topology"
+config:
+  topology.workers: 1
+
+# spout definitions
+spouts:
+  - id: "sentence-spout"
+    className: "org.apache.storm.flux.spouts.GenericShellSpout"
+    # shell spout constructor takes 2 arguments: String[], String[]
+    constructorArgs:
+      # command line
+      - ["node", "randomsentence.js"]
+      # output fields
+      - ["word"]
+    parallelism: 1
+
+# bolt definitions
+bolts:
+  - id: "splitsentence"
+    className: "org.apache.storm.flux.bolts.GenericShellBolt"
+    constructorArgs:
+      # command line
+      - ["python", "splitsentence.py"]
+      # output fields
+      - ["word"]
+    parallelism: 1
+
+  - id: "log"
+    className: "org.apache.storm.flux.wrappers.bolts.LogInfoBolt"
+    parallelism: 1
+
+  - id: "count"
+    className: "backtype.storm.testing.TestWordCounter"
+    parallelism: 1
+
+#stream definitions
+# stream definitions define connections between spouts and bolts.
+# note that such connections can be cyclical
+# custom stream groupings are also supported
+
+streams:
+  - name: "spout --> split" # name isn't used (placeholder for logging, UI, 
etc.)
+    from: "sentence-spout"
+    to: "splitsentence"
+    grouping:
+      type: SHUFFLE
+
+  - name: "split --> count"
+    from: "splitsentence"
+    to: "count"
+    grouping:
+      type: FIELDS
+      args: ["word"]
+
+  - name: "count --> log"
+    from: "count"
+    to: "log"
+    grouping:
+      type: SHUFFLE
+```
+
+
+## Micro-Batching (Trident) API Support
+Currenty, the Flux YAML DSL only supports the Core Storm API, but support for 
Storm's micro-batching API is planned.
+
+To use Flux with a Trident topology, define a topology getter method and 
reference it in your YAML config:
+
+```yaml
+name: "my-trident-topology"
+
+config:
+  topology.workers: 1
+
+topologySource:
+  className: "org.apache.storm.flux.test.TridentTopologySource"
+  # Flux will look for "getTopology", this will override that.
+  methodName: "getTopologyWithDifferentMethodName"
+```

Added: storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md
URL: 
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md?rev=1735516&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md 
(added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/storm-eventhubs.md Thu 
Mar 17 22:48:32 2016
@@ -0,0 +1,41 @@
+---
+title: Azue Event Hubs Integration
+layout: documentation
+documentation: true
+version: v0.10.0
+---
+
+Storm spout and bolt implementation for Microsoft Azure Eventhubs
+
+### build ###
+       mvn clean package
+
+### run sample topology ###
+To run the sample topology, you need to modify the config.properties file with
+the eventhubs configurations. Here is an example:
+
+       eventhubspout.username = [username: policy name in EventHubs Portal]
+       eventhubspout.password = [password: shared access key in EventHubs 
Portal]
+       eventhubspout.namespace = [namespace]
+       eventhubspout.entitypath = [entitypath]
+       eventhubspout.partitions.count = [partitioncount]
+
+       # if not provided, will use storm's zookeeper settings
+       # 
zookeeper.connectionstring=zookeeper0:2181,zookeeper1:2181,zookeeper2:2181
+
+       eventhubspout.checkpoint.interval = 10
+       eventhub.receiver.credits = 1024
+
+Then you can use storm.cmd to submit the sample topology:
+       storm jar {jarfile} com.microsoft.eventhubs.samples.EventCount 
{topologyname} {spoutconffile}
+       where the {jarfile} should be: 
eventhubs-storm-spout-{version}-jar-with-dependencies.jar
+
+### Run EventHubSendClient ###
+We have included a simple EventHubs send client for testing purpose. You can 
run the client like this:
+       java -cp 
.\target\eventhubs-storm-spout-{version}-jar-with-dependencies.jar 
com.microsoft.eventhubs.client.EventHubSendClient
+       [username] [password] [entityPath] [partitionId] [messageSize] 
[messageCount]
+If you want to send messages to all partitions, use "-1" as partitionId.
+
+### Windows Azure Eventhubs ###
+       http://azure.microsoft.com/en-us/services/event-hubs/
+

svn commit: r1735516 [1/2] - in /storm/branches/bobby-versioned-site: ./ releases/0.10.0/

Reply via email to