[Hadoop Wiki] Update of "Hbase/PoweredBy" by StevenNoel s

Apache Wiki Thu, 11 Nov 2010 01:48:14 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hbase/PoweredBy" page has been changed by StevenNoels.
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=49&rev2=50

--------------------------------------------------

  
  [[http://www.drawntoscaleconsulting.com|Drawn to Scale Consulting]] consults 
on HBase, Hadoop, Distributed Search, and Scalable architectures.
  
- [[http://www.filmweb.pl|Filmweb]] is a film web portal with a large dataset 
of films, persons and movie-related entities. We have just started a small 
cluster of 3 HBase nodes to handle our web cache persistency 
layer. We plan to increase the cluster size, and also to start migrating some 
of the data from our databases which have some demanding scalability 
requirements.  
+ [[http://www.filmweb.pl|Filmweb]] is a film web portal with a large dataset 
of films, persons and movie-related entities. We have just started a small 
cluster of 3 HBase nodes to handle our web cache persistency layer. We plan to 
increase the cluster size, and also to start migrating some of the data from 
our databases which have some demanding scalability requirements.
  
  [[http://www.flurry.com|Flurry]] provides mobile application analytics.  We 
use HBase and Hadoop for all of our analytics processing, and serve all of our 
live requests directly out of HBase on our 50 node production cluster with tens 
of billions of rows over several tables.
  
@@ -12, +12 @@

  
  [[http://www.kalooga.com|Kalooga]] is a discovery service for image 
galleries. We use Hadoop, Hbase, Chukwa and Pig on a 20-node cluster for our 
crawling, analysis and events processing.
  
- [[http://www.lilycms.org|Lily]] is an open source content repository backed 
by HBase and SOLR from Outerthought - scalable content applications.
+ [[http://www.lilyproject.org|Lily]] is an open source content repository, 
backed by HBase and SOLR from Outerthought - scalable content applications.
  
  [[http://www.mahalo.com|Mahalo]], "...the world's first human-powered search 
engine". All the markup that powers the wiki is stored in HBase. It's been in 
use for a few months now. !MediaWiki - the same software that power Wikipedia - 
has version/revision control. Mahalo's in-house editors produce a lot of 
revisions per day, which was not working well in a RDBMS. An hbase-based 
solution for this was built and tested, and the data migrated out of MySQL and 
into HBase. Right now it's at something like 6 million items in HBase. The 
upload tool runs every hour from a shell script to back up that data, and on 6 
nodes takes about 5-10 minutes to run - and does not slow down production at 
all.
  
@@ -24, +24 @@

  
  [[http://www.powerset.com/|Powerset (a Microsoft company)]] uses HBase to 
store raw documents.  We have a ~110 node hadoop cluster running DFS, 
mapreduce, and hbase.  In our wikipedia hbase table, we have one row for each 
wikipedia page (~2.5M pages and climbing).  We use this as input to our 
indexing jobs, which are run in hadoop mapreduce.  Uploading the entire 
wikipedia dump to our cluster takes a couple hours.  Scanning the table inside 
mapreduce is very fast -- the latency is in the noise compared to everything 
else we do.
  
- [[http://www.readpath.com/|ReadPath]] uses HBase to store several hundred 
million RSS items and dictionary for its RSS newsreader. Readpath is currently 
running on an 8 node cluster. 
+ [[http://www.readpath.com/|ReadPath]] uses HBase to store several hundred 
million RSS items and dictionary for its RSS newsreader. Readpath is currently 
running on an 8 node cluster.
  
  [[http://www.runa.com/|Runa Inc.]] offers a SaaS that enables online 
merchants to offer dynamic per-consumer, per-product promotions embedded in 
their website. To implement this we collect the click streams of all their 
visitors to determine along with the rules of the merchant what promotion to 
offer the visitor at different points of their browsing the Merchant website. 
So we have lots of data and have to do lots of off-line and real-time 
analytics. HBase is the core for us. We also use Clojure and our own open 
sourced distributed processing framework, Swarmiji. The HBase Community has 
been key to our forward movement with HBase. We're looking for experienced 
developers to join us to help make things go even faster!

[Hadoop Wiki] Update of "Hbase/PoweredBy" by StevenNoel s

Reply via email to