Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal
Hi,

The way you describe the in memory caching component, it looks very
similar to HBase memstore. Any reason for not relying on it?

N.

On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian 
christian.kleegr...@siemens.com wrote:

 Dear all,

 We are currently working on an architecture for a system that should be
 serve as an archive for 1000+ measuring components that frequently (~30/s)
 send messages containing measurement values (~300 bytes/message). The
 archiving system should be capable of not only serving as a long term
 storage but also as a kind of streaming data processing and caching
 component. There are several functions that should be computed on the
 incoming data before finally storing it.

 We suggested an architecture that comprises of:
 A message routing component that could route data to calculations and
 route calculation results to other components that are interested in these
 data.
 An in memory caching component that is used for storing up to 10 - 20
 minutes of data before it is written to the long term archive.
 An hBase database that is used for the long term storage.
 MapReduce framework for doing analytics on the data stored in the hBase
 database.

 The complete system should be failsafe and reliable regarding component
 failures and it should scale with the number of computers that are utilized.

 Are there any suggestions or feedback to this approach from the community?
 and are there any suggestions which tools or systems to use for the message
 routing component and the in memory cache.

 Thanks for any help and suggestions

 all the best

 Christian


 8---

 Siemens AG
 Corporate Technology
 Corporate Research and Technologies
 CT T DE IT3
 Otto-Hahn-Ring 6
 81739 Munich, Germany
 Tel.: +49 89 636-42722
 Fax: +49 89 636-41423
 mailto:christian.kleegr...@siemens.com

 Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
 Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
 Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
 Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
 Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
 Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684;
 WEEE-Reg.-No. DE 23691322



Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal
Hi Christian,

It's a component internal to HBase, so you don't have to use it directly.
See http://hbase.apache.org/book/wal.html on how writes are handled by
HBase to ensure reliability  data distribution...

Cheers,

N.

On Fri, Mar 16, 2012 at 7:39 PM, Kleegrewe, Christian 
christian.kleegr...@siemens.com wrote:

 Hi

 Is this memstore replicated? Since we store a significant amount of data
 in the memory cache we need a replicated solution. Also I can't find lots
 of information besides a java api doc for the MemStore class. I will
 continue searching for this, but if you have any URL with more
 documentation please send it. Thanks in advance

 regards

 Christian


 8--
 Siemens AG
 Corporate Technology
 Corporate Research and Technologies
 CT T DE IT3
 Otto-Hahn-Ring 6
 81739 München, Deutschland
 Tel.: +49 89 636-42722
 Fax: +49 89 636-41423
 mailto:christian.kleegr...@siemens.com

 Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard
 Cromme; Vorstand: Peter Löscher, Vorsitzender; Roland Busch, Brigitte
 Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt,
 Siegfried Russwurm, Peter Y. Solmssen, Michael Süß; Sitz der Gesellschaft:
 Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg,
 HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


 -Ursprüngliche Nachricht-
 Von: N Keywal [mailto:nkey...@gmail.com]
 Gesendet: Freitag, 16. März 2012 18:02
 An: user@hbase.apache.org
 Betreff: Re: Streaming data processing and hBase

 Hi,

 The way you describe the in memory caching component, it looks very
 similar to HBase memstore. Any reason for not relying on it?

 N.

 On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian 
 christian.kleegr...@siemens.com wrote:

  Dear all,
 
  We are currently working on an architecture for a system that should be
  serve as an archive for 1000+ measuring components that frequently
 (~30/s)
  send messages containing measurement values (~300 bytes/message). The
  archiving system should be capable of not only serving as a long term
  storage but also as a kind of streaming data processing and caching
  component. There are several functions that should be computed on the
  incoming data before finally storing it.
 
  We suggested an architecture that comprises of:
  A message routing component that could route data to calculations and
  route calculation results to other components that are interested in
 these
  data.
  An in memory caching component that is used for storing up to 10 - 20
  minutes of data before it is written to the long term archive.
  An hBase database that is used for the long term storage.
  MapReduce framework for doing analytics on the data stored in the hBase
  database.
 
  The complete system should be failsafe and reliable regarding component
  failures and it should scale with the number of computers that are
 utilized.
 
  Are there any suggestions or feedback to this approach from the
 community?
  and are there any suggestions which tools or systems to use for the
 message
  routing component and the in memory cache.
 
  Thanks for any help and suggestions
 
  all the best
 
  Christian
 
 
 
 8---
 
  Siemens AG
  Corporate Technology
  Corporate Research and Technologies
  CT T DE IT3
  Otto-Hahn-Ring 6
  81739 Munich, Germany
  Tel.: +49 89 636-42722
  Fax: +49 89 636-41423
  mailto:christian.kleegr...@siemens.com
 
  Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
  Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
  Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
  Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
  Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
  Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB
 6684;
  WEEE-Reg.-No. DE 23691322