RE: Storm with RDBMS

2014-06-03 Thread Balakrishna R
Thanks Alex and Ncleung for your inputs.

Both the options looks to be valid based on the data size. I am thinking now, 
even RDBMS might scale well for this scenario if bolt is using only “Read” 
operations on database. However, will update later on the approach taken.


Regards
Balakrishna
From: alex kamil [mailto:alex.ka...@gmail.com]
Sent: Monday, June 02, 2014 10:18 PM
To: user@storm.incubator.apache.org
Subject: Re: Storm with RDBMS

for parallel reads of massive historical data and high volume writes you could 
you a distributed db with SQL layer such as Apache 
Hbase+Phoenixhttp://phoenix.incubator.apache.org/, I think it might  
complement Storm nicely

On Mon, Jun 2, 2014 at 10:19 AM, Nathan Leung 
ncle...@gmail.commailto:ncle...@gmail.com wrote:
Something like memcached is commonly used for this scenario.  Is memcached 
poorly suited for your goals or data access patterns?

On Mon, Jun 2, 2014 at 10:06 AM, Balakrishna R 
balakrishn...@spanservices.commailto:balakrishn...@spanservices.com wrote:
Hi,

We are evaluating ‘Apache storm’ for one of the business use cases. In this use 
case, the incoming transactions/stream should be processed by set of rules or 
logic. In this process, there is a need of considering the historical data (may 
be 2 weeks or a month old) also.

Understand that, Storm will give better performance to process the incoming 
transactions in real-time. What if we have to read the historical data from 
RDBMS and use that data in the bolts?
Will this degrade the performance of whole cluster (as RDBMS systems might 
cause some delay due to the high load of reads from the parallelizing different 
bolts to achieve the better performance).

Any suggestion on solving this situation? Please share.


Thanks
Balakrishna

DISCLAIMER: This email message and all attachments are confidential and may 
contain information that is Privileged, Confidential or exempt from disclosure 
under applicable law. If you are not the intended recipient, you are notified 
that any dissemination, distribution or copying of this email is strictly 
prohibited.  If you have received this email in error, please notify us 
immediately by return email to 
mailad...@spanservices.commailto:mailad...@spanservices.com and destroy the 
original message.  Opinions, conclusions and other information in this message 
that do not relate to the official of SPAN, shall be understood to be nether 
given nor endorsed by SPAN.


DISCLAIMER: This email message and all attachments are confidential and may 
contain information that is Privileged, Confidential or exempt from disclosure 
under applicable law. If you are not the intended recipient, you are notified 
that any dissemination, distribution or copying of this email is strictly 
prohibited.  If you have received this email in error, please notify us 
immediately by return email to mailad...@spanservices.com and destroy the 
original message.  Opinions, conclusions and other information in this message 
that do not relate to the official of SPAN, shall be understood to be nether 
given nor endorsed by SPAN.


Re: Storm with RDBMS

2014-06-02 Thread alex kamil
for parallel reads of massive historical data and high volume writes you
could you a distributed db with SQL layer such as Apache Hbase+Phoenix
http://phoenix.incubator.apache.org/, I think it might  complement Storm
nicely


On Mon, Jun 2, 2014 at 10:19 AM, Nathan Leung ncle...@gmail.com wrote:

 Something like memcached is commonly used for this scenario.  Is memcached
 poorly suited for your goals or data access patterns?


 On Mon, Jun 2, 2014 at 10:06 AM, Balakrishna R 
 balakrishn...@spanservices.com wrote:

  Hi,



 We are evaluating ‘Apache storm’ for one of the business use cases. In
 this use case, the incoming transactions/stream should be processed by set
 of rules or logic. In this process, there is a need of considering the
 historical data (may be 2 weeks or a month old) also.



 Understand that, Storm will give better performance to process the
 incoming transactions in real-time. What if we have to read the historical
 data from RDBMS and use that data in the bolts?

 Will this degrade the performance of whole cluster (as RDBMS systems
 might cause some delay due to the high load of reads from the parallelizing
 different bolts to achieve the better performance).



 Any suggestion on solving this situation? Please share.





 Thanks

 Balakrishna

 DISCLAIMER: This email message and all attachments are confidential and
 may contain information that is Privileged, Confidential or exempt from
 disclosure under applicable law. If you are not the intended recipient, you
 are notified that any dissemination, distribution or copying of this email
 is strictly prohibited.  If you have received this email in error, please
 notify us immediately by return email to mailad...@spanservices.com and
 destroy the original message.  Opinions, conclusions and other information
 in this message that do not relate to the official of SPAN, shall be
 understood to be nether given nor endorsed by SPAN.