Re: Storm use case
Actually, we just reply with 200 OK to users just after receiving the request (i.e before sending the message to kafka). We have implemented a native storm topology which uses the core Kafka spout (https://github.com/apache/storm/tree/master/external/storm-kafka) 2014-09-19 12:24 GMT+02:00 Ayush Vatsyayan : > Thanks Florian. I've one more query - how are you sending the reply back > to user. Are you using DRPC in trident topology or using the DRPC in native > storm. > > The native one is deprecated, so I'm trying to use the trident topology > one. > > On Fri, Sep 19, 2014 at 3:08 PM, Florian Hussonnois > wrote: > >> Hi, >> >> I currently working on a project which looks like yours. >> >> We are using vert.x for implementing web services. Each request is >> formatted as json and sent to a kafka topic. After that, we have a storm >> topology (using a KafkaSpout) which performs request parameters extraction >> and data enrichment on each message before writing them into hbase. >> >> In a development environment (one Vertx instance and storm cluster >> running on the same host ) we actually handle more than 10K req/sec. >> >> Hope this helps! >> >> 2014-09-19 8:43 GMT+02:00 Ayush Vatsyayan : >> >>> can anyone provide any leads or insight into it? >>> >>> I also looked into the DRPC but, it's deprecated for the native storm. >>> While in trident it's a bit complex and I cannot find anything that >>> describes the indepth of what all trident is providing. >>> >>> On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan >>> wrote: >>> We are trying to build a webservices application that can support *10k TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if using storm is the right fit here. Here is the scenario: Client will send a webservice request, which we will receive it (using apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS storm spout will receive it and sent it to the bolt. In bolt we will be performing the validation that involves db calls, and once done we will persist the data in no-sql db. I understand the advantages of using storm, but my concern is that we are not performing some complex bolt chaining and might be using one or two bolts. I'm confused whether storm fits well in this case? P.S. we are planning to deploy webservices on the application server in cluster setup to support 10k TPS. Not sure if cluster setup is good approach, but I'll look into it later. >>> >>> >> >> >> -- >> Florian HUSSONNOIS >> Tel +33 6 26 92 82 23 >> > > -- Florian HUSSONNOIS Tel +33 6 26 92 82 23
Re: Storm use case
Thanks Florian. I've one more query - how are you sending the reply back to user. Are you using DRPC in trident topology or using the DRPC in native storm. The native one is deprecated, so I'm trying to use the trident topology one. On Fri, Sep 19, 2014 at 3:08 PM, Florian Hussonnois wrote: > Hi, > > I currently working on a project which looks like yours. > > We are using vert.x for implementing web services. Each request is > formatted as json and sent to a kafka topic. After that, we have a storm > topology (using a KafkaSpout) which performs request parameters extraction > and data enrichment on each message before writing them into hbase. > > In a development environment (one Vertx instance and storm cluster running > on the same host ) we actually handle more than 10K req/sec. > > Hope this helps! > > 2014-09-19 8:43 GMT+02:00 Ayush Vatsyayan : > >> can anyone provide any leads or insight into it? >> >> I also looked into the DRPC but, it's deprecated for the native storm. >> While in trident it's a bit complex and I cannot find anything that >> describes the indepth of what all trident is providing. >> >> On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan >> wrote: >> >>> We are trying to build a webservices application that can support *10k >>> TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if >>> using storm is the right fit here. Here is the scenario: >>> >>> Client will send a webservice request, which we will receive it (using >>> apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS >>> storm spout will receive it and sent it to the bolt. In bolt we will be >>> performing the validation that involves db calls, and once done we will >>> persist the data in no-sql db. >>> >>> I understand the advantages of using storm, but my concern is that we >>> are not performing some complex bolt chaining and might be using one or two >>> bolts. I'm confused whether storm fits well in this case? >>> >>> P.S. we are planning to deploy webservices on the application server in >>> cluster setup to support 10k TPS. Not sure if cluster setup is good >>> approach, but I'll look into it later. >>> >> >> > > > -- > Florian HUSSONNOIS > Tel +33 6 26 92 82 23 >
Re: Storm use case
Hi, I currently working on a project which looks like yours. We are using vert.x for implementing web services. Each request is formatted as json and sent to a kafka topic. After that, we have a storm topology (using a KafkaSpout) which performs request parameters extraction and data enrichment on each message before writing them into hbase. In a development environment (one Vertx instance and storm cluster running on the same host ) we actually handle more than 10K req/sec. Hope this helps! 2014-09-19 8:43 GMT+02:00 Ayush Vatsyayan : > can anyone provide any leads or insight into it? > > I also looked into the DRPC but, it's deprecated for the native storm. > While in trident it's a bit complex and I cannot find anything that > describes the indepth of what all trident is providing. > > On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan > wrote: > >> We are trying to build a webservices application that can support *10k >> TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if >> using storm is the right fit here. Here is the scenario: >> >> Client will send a webservice request, which we will receive it (using >> apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS >> storm spout will receive it and sent it to the bolt. In bolt we will be >> performing the validation that involves db calls, and once done we will >> persist the data in no-sql db. >> >> I understand the advantages of using storm, but my concern is that we are >> not performing some complex bolt chaining and might be using one or two >> bolts. I'm confused whether storm fits well in this case? >> >> P.S. we are planning to deploy webservices on the application server in >> cluster setup to support 10k TPS. Not sure if cluster setup is good >> approach, but I'll look into it later. >> > > -- Florian HUSSONNOIS Tel +33 6 26 92 82 23
Re: Storm use case
can anyone provide any leads or insight into it? I also looked into the DRPC but, it's deprecated for the native storm. While in trident it's a bit complex and I cannot find anything that describes the indepth of what all trident is providing. On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan wrote: > We are trying to build a webservices application that can support *10k > TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if > using storm is the right fit here. Here is the scenario: > > Client will send a webservice request, which we will receive it (using > apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS > storm spout will receive it and sent it to the bolt. In bolt we will be > performing the validation that involves db calls, and once done we will > persist the data in no-sql db. > > I understand the advantages of using storm, but my concern is that we are > not performing some complex bolt chaining and might be using one or two > bolts. I'm confused whether storm fits well in this case? > > P.S. we are planning to deploy webservices on the application server in > cluster setup to support 10k TPS. Not sure if cluster setup is good > approach, but I'll look into it later. >
Storm use case
We are trying to build a webservices application that can support *10k TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if using storm is the right fit here. Here is the scenario: Client will send a webservice request, which we will receive it (using apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS storm spout will receive it and sent it to the bolt. In bolt we will be performing the validation that involves db calls, and once done we will persist the data in no-sql db. I understand the advantages of using storm, but my concern is that we are not performing some complex bolt chaining and might be using one or two bolts. I'm confused whether storm fits well in this case? P.S. we are planning to deploy webservices on the application server in cluster setup to support 10k TPS. Not sure if cluster setup is good approach, but I'll look into it later.
Re: storm use case in sensor data
If the data is emitted in terms of streams continuously from a sensor then Storm would be the ideal framework to process the data. Storm infact lets you process streams of data in real time. To get brief overview go through documentation over storm. https://storm.incubator.apache.org/ On Thu, Sep 4, 2014 at 7:50 PM, Yuheng Du wrote: > Hi guys, > > Does anyone use Storm to deal with sensor network data? I need some use > cases or research project ideas of Storm or other big data tools in sensor > network field. Can I get a sense of what the advantage of adopting Storm > platform? > > Best, > > Yuheng > >
storm use case in sensor data
Hi guys, Does anyone use Storm to deal with sensor network data? I need some use cases or research project ideas of Storm or other big data tools in sensor network field. Can I get a sense of what the advantage of adopting Storm platform? Best, Yuheng
Re: storm use case questions
Hi all, Thanks. Does anyone use Storm to deal with sensor network data? I need some use cases or research project ideas of Storm or other big data tools in sensor network field. Can I get a sense of what the advantage of adopting Storm platform? Best, On Wed, Sep 3, 2014 at 4:51 AM, Tian Guo wrote: > Hi, All > > Regarding the average and standard deviation of a stream from a specific > sensor, these two variables can be computed incrementally and take > constant time to update. So, I do not see the burden even if the > implementation is trivial. And the distributed stream processing looks like > redundant for only hundreds of streams. > > Storm is a cluster based distributed data processing rather than > a decentralized system like sensor network. Whether it is applicable for > your scenario depends on where you deploy it inside your architecture. > > Best, > > > 2014-09-03 8:59 GMT+02:00 Vikas Agarwal : > > Hi Yuheng, >> >> We are also exploring/implementing for analyzing stream of messages >> (twitter stream and other sources). With my short experience, one thing I >> came know is that a lot would depend on the parallelism of the spouts in >> your topology, so you can parallelize the ingestion of data using >> partitioning or similar stuff, you can benefit from storm definitely >> otherwise you would see lot of failed messages which may accumulate a large >> backlog of such overflowing input data. >> >> >> On Wed, Sep 3, 2014 at 1:01 AM, Yuheng Du >> wrote: >> >>> Hi guys, >>> >>> I have a stream of sensor data coming from rabbitmq. For each sensor >>> message, it is of the JSON format and have the following fields: >>> >>> deviceId: "BOT-N3" >>> reading0: 2.25 >>> reading1: 3.78 >>> >>> readingN: -1.35 >>> >>> each float number of readingN represents a sensor reading on a specific >>> field location. >>> >>> Now for each incoming message, I want to do a query which gives me the >>> average and standard deviation of a certain 'deviceId' 's 'readingN' over a >>> custom time range (a year ago to now, a month ago to now, etc). So if N=28, >>> for each incoming message I will need to do 28 queries on the historic data >>> at almost the same time. I need the query results to be returned in near >>> real time so the other incoming messages won't get blocked. >>> >>> Is STORM a good solution to this issue? >>> >>> I have tried Elasticsearch-Logstash-Kibana stack already, It seems that >>> when the incoming message rates are high, The messages will be blocked >>> since the ES server can't correspond to hundreds of query requesst at >>> the same time. >>> >>> Will STORM help me in this case? What is the common use case of STORM in >>> processing real-time sensor data (coming from sensor network specifically)? >>> >>> Thanks! >>> >>> best >>> >>> Yuheng >>> >> >> >> >> -- >> Regards, >> Vikas Agarwal >> 91 – 9928301411 >> >> InfoObjects, Inc. >> Execution Matters >> http://www.infoobjects.com >> 2041 Mission College Boulevard, #280 >> Santa Clara, CA 95054 >> +1 (408) 988-2000 Work >> +1 (408) 716-2726 Fax >> >> >
Re: storm use case questions
Hi, All Regarding the average and standard deviation of a stream from a specific sensor, these two variables can be computed incrementally and take constant time to update. So, I do not see the burden even if the implementation is trivial. And the distributed stream processing looks like redundant for only hundreds of streams. Storm is a cluster based distributed data processing rather than a decentralized system like sensor network. Whether it is applicable for your scenario depends on where you deploy it inside your architecture. Best, 2014-09-03 8:59 GMT+02:00 Vikas Agarwal : > Hi Yuheng, > > We are also exploring/implementing for analyzing stream of messages > (twitter stream and other sources). With my short experience, one thing I > came know is that a lot would depend on the parallelism of the spouts in > your topology, so you can parallelize the ingestion of data using > partitioning or similar stuff, you can benefit from storm definitely > otherwise you would see lot of failed messages which may accumulate a large > backlog of such overflowing input data. > > > On Wed, Sep 3, 2014 at 1:01 AM, Yuheng Du > wrote: > >> Hi guys, >> >> I have a stream of sensor data coming from rabbitmq. For each sensor >> message, it is of the JSON format and have the following fields: >> >> deviceId: "BOT-N3" >> reading0: 2.25 >> reading1: 3.78 >> >> readingN: -1.35 >> >> each float number of readingN represents a sensor reading on a specific >> field location. >> >> Now for each incoming message, I want to do a query which gives me the >> average and standard deviation of a certain 'deviceId' 's 'readingN' over a >> custom time range (a year ago to now, a month ago to now, etc). So if N=28, >> for each incoming message I will need to do 28 queries on the historic data >> at almost the same time. I need the query results to be returned in near >> real time so the other incoming messages won't get blocked. >> >> Is STORM a good solution to this issue? >> >> I have tried Elasticsearch-Logstash-Kibana stack already, It seems that >> when the incoming message rates are high, The messages will be blocked >> since the ES server can't correspond to hundreds of query requesst at >> the same time. >> >> Will STORM help me in this case? What is the common use case of STORM in >> processing real-time sensor data (coming from sensor network specifically)? >> >> Thanks! >> >> best >> >> Yuheng >> > > > > -- > Regards, > Vikas Agarwal > 91 – 9928301411 > > InfoObjects, Inc. > Execution Matters > http://www.infoobjects.com > 2041 Mission College Boulevard, #280 > Santa Clara, CA 95054 > +1 (408) 988-2000 Work > +1 (408) 716-2726 Fax > >
Re: storm use case questions
Hi Yuheng, We are also exploring/implementing for analyzing stream of messages (twitter stream and other sources). With my short experience, one thing I came know is that a lot would depend on the parallelism of the spouts in your topology, so you can parallelize the ingestion of data using partitioning or similar stuff, you can benefit from storm definitely otherwise you would see lot of failed messages which may accumulate a large backlog of such overflowing input data. On Wed, Sep 3, 2014 at 1:01 AM, Yuheng Du wrote: > Hi guys, > > I have a stream of sensor data coming from rabbitmq. For each sensor > message, it is of the JSON format and have the following fields: > > deviceId: "BOT-N3" > reading0: 2.25 > reading1: 3.78 > > readingN: -1.35 > > each float number of readingN represents a sensor reading on a specific > field location. > > Now for each incoming message, I want to do a query which gives me the > average and standard deviation of a certain 'deviceId' 's 'readingN' over a > custom time range (a year ago to now, a month ago to now, etc). So if N=28, > for each incoming message I will need to do 28 queries on the historic data > at almost the same time. I need the query results to be returned in near > real time so the other incoming messages won't get blocked. > > Is STORM a good solution to this issue? > > I have tried Elasticsearch-Logstash-Kibana stack already, It seems that > when the incoming message rates are high, The messages will be blocked > since the ES server can't correspond to hundreds of query requesst at the > same time. > > Will STORM help me in this case? What is the common use case of STORM in > processing real-time sensor data (coming from sensor network specifically)? > > Thanks! > > best > > Yuheng > -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax
storm use case questions
Hi guys, I have a stream of sensor data coming from rabbitmq. For each sensor message, it is of the JSON format and have the following fields: deviceId: "BOT-N3" reading0: 2.25 reading1: 3.78 readingN: -1.35 each float number of readingN represents a sensor reading on a specific field location. Now for each incoming message, I want to do a query which gives me the average and standard deviation of a certain 'deviceId' 's 'readingN' over a custom time range (a year ago to now, a month ago to now, etc). So if N=28, for each incoming message I will need to do 28 queries on the historic data at almost the same time. I need the query results to be returned in near real time so the other incoming messages won't get blocked. Is STORM a good solution to this issue? I have tried Elasticsearch-Logstash-Kibana stack already, It seems that when the incoming message rates are high, The messages will be blocked since the ES server can't correspond to hundreds of query requesst at the same time. Will STORM help me in this case? What is the common use case of STORM in processing real-time sensor data (coming from sensor network specifically)? Thanks! best Yuheng