Re: Storm use case

2014-09-19 Thread Florian Hussonnois
Actually, we just reply with 200 OK to users just after receiving the
request (i.e before sending the message to kafka).

We have implemented a native storm topology which uses the core Kafka spout
(https://github.com/apache/storm/tree/master/external/storm-kafka)

2014-09-19 12:24 GMT+02:00 Ayush Vatsyayan :

> Thanks Florian. I've one more query - how are you sending the reply back
> to user. Are you using DRPC in trident topology or using the DRPC in native
> storm.
>
> The native one is deprecated, so I'm trying to use the trident topology
> one.
>
> On Fri, Sep 19, 2014 at 3:08 PM, Florian Hussonnois  > wrote:
>
>> Hi,
>>
>> I currently working on a project which looks like yours.
>>
>> We are using vert.x for implementing web services. Each request is
>> formatted as json and sent to a kafka topic. After that, we have a storm
>> topology (using a KafkaSpout) which performs request parameters extraction
>> and data enrichment on each message before writing them into hbase.
>>
>> In a development environment (one Vertx instance and storm cluster
>> running on the same host ) we actually handle more than 10K req/sec.
>>
>> Hope this helps!
>>
>> 2014-09-19 8:43 GMT+02:00 Ayush Vatsyayan :
>>
>>> can anyone provide any leads or insight into it?
>>>
>>> I also looked into the DRPC but, it's deprecated for the native storm.
>>> While in trident it's a bit complex and I cannot find anything that
>>> describes the indepth of what all trident is providing.
>>>
>>> On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan 
>>> wrote:
>>>
 We are trying to build a webservices application that can support *10k
 TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if
 using storm is the right fit here. Here is the scenario:

 Client will send a webservice request, which we will receive it (using
 apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS
 storm spout will receive it and sent it to the bolt. In bolt we will be
 performing the validation that involves db calls, and once done we will
 persist the data in no-sql db.

 I understand the advantages of using storm, but my concern is that we
 are not performing some complex bolt chaining and might be using one or two
 bolts. I'm confused whether storm fits well in this case?

 P.S. we are planning to deploy webservices on the application server in
 cluster setup to support 10k TPS. Not sure if cluster setup is good
 approach, but I'll look into it later.

>>>
>>>
>>
>>
>> --
>> Florian HUSSONNOIS
>> Tel +33 6 26 92 82 23
>>
>
>


-- 
Florian HUSSONNOIS
Tel +33 6 26 92 82 23


Re: Storm use case

2014-09-19 Thread Ayush Vatsyayan
Thanks Florian. I've one more query - how are you sending the reply back to
user. Are you using DRPC in trident topology or using the DRPC in native
storm.

The native one is deprecated, so I'm trying to use the trident topology one.

On Fri, Sep 19, 2014 at 3:08 PM, Florian Hussonnois 
wrote:

> Hi,
>
> I currently working on a project which looks like yours.
>
> We are using vert.x for implementing web services. Each request is
> formatted as json and sent to a kafka topic. After that, we have a storm
> topology (using a KafkaSpout) which performs request parameters extraction
> and data enrichment on each message before writing them into hbase.
>
> In a development environment (one Vertx instance and storm cluster running
> on the same host ) we actually handle more than 10K req/sec.
>
> Hope this helps!
>
> 2014-09-19 8:43 GMT+02:00 Ayush Vatsyayan :
>
>> can anyone provide any leads or insight into it?
>>
>> I also looked into the DRPC but, it's deprecated for the native storm.
>> While in trident it's a bit complex and I cannot find anything that
>> describes the indepth of what all trident is providing.
>>
>> On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan 
>> wrote:
>>
>>> We are trying to build a webservices application that can support *10k
>>> TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if
>>> using storm is the right fit here. Here is the scenario:
>>>
>>> Client will send a webservice request, which we will receive it (using
>>> apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS
>>> storm spout will receive it and sent it to the bolt. In bolt we will be
>>> performing the validation that involves db calls, and once done we will
>>> persist the data in no-sql db.
>>>
>>> I understand the advantages of using storm, but my concern is that we
>>> are not performing some complex bolt chaining and might be using one or two
>>> bolts. I'm confused whether storm fits well in this case?
>>>
>>> P.S. we are planning to deploy webservices on the application server in
>>> cluster setup to support 10k TPS. Not sure if cluster setup is good
>>> approach, but I'll look into it later.
>>>
>>
>>
>
>
> --
> Florian HUSSONNOIS
> Tel +33 6 26 92 82 23
>


Re: Storm use case

2014-09-19 Thread Florian Hussonnois
Hi,

I currently working on a project which looks like yours.

We are using vert.x for implementing web services. Each request is
formatted as json and sent to a kafka topic. After that, we have a storm
topology (using a KafkaSpout) which performs request parameters extraction
and data enrichment on each message before writing them into hbase.

In a development environment (one Vertx instance and storm cluster running
on the same host ) we actually handle more than 10K req/sec.

Hope this helps!

2014-09-19 8:43 GMT+02:00 Ayush Vatsyayan :

> can anyone provide any leads or insight into it?
>
> I also looked into the DRPC but, it's deprecated for the native storm.
> While in trident it's a bit complex and I cannot find anything that
> describes the indepth of what all trident is providing.
>
> On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan 
> wrote:
>
>> We are trying to build a webservices application that can support *10k
>> TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if
>> using storm is the right fit here. Here is the scenario:
>>
>> Client will send a webservice request, which we will receive it (using
>> apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS
>> storm spout will receive it and sent it to the bolt. In bolt we will be
>> performing the validation that involves db calls, and once done we will
>> persist the data in no-sql db.
>>
>> I understand the advantages of using storm, but my concern is that we are
>> not performing some complex bolt chaining and might be using one or two
>> bolts. I'm confused whether storm fits well in this case?
>>
>> P.S. we are planning to deploy webservices on the application server in
>> cluster setup to support 10k TPS. Not sure if cluster setup is good
>> approach, but I'll look into it later.
>>
>
>


-- 
Florian HUSSONNOIS
Tel +33 6 26 92 82 23


Re: Storm use case

2014-09-18 Thread Ayush Vatsyayan
can anyone provide any leads or insight into it?

I also looked into the DRPC but, it's deprecated for the native storm.
While in trident it's a bit complex and I cannot find anything that
describes the indepth of what all trident is providing.

On Tue, Sep 16, 2014 at 12:43 PM, Ayush Vatsyayan 
wrote:

> We are trying to build a webservices application that can support *10k
> TPS*. I'm trying to do some POC's on strom, but I'm a bit concerned if
> using storm is the right fit here. Here is the scenario:
>
> Client will send a webservice request, which we will receive it (using
> apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS
> storm spout will receive it and sent it to the bolt. In bolt we will be
> performing the validation that involves db calls, and once done we will
> persist the data in no-sql db.
>
> I understand the advantages of using storm, but my concern is that we are
> not performing some complex bolt chaining and might be using one or two
> bolts. I'm confused whether storm fits well in this case?
>
> P.S. we are planning to deploy webservices on the application server in
> cluster setup to support 10k TPS. Not sure if cluster setup is good
> approach, but I'll look into it later.
>


Storm use case

2014-09-16 Thread Ayush Vatsyayan
We are trying to build a webservices application that can support *10k TPS*.
I'm trying to do some POC's on strom, but I'm a bit concerned if using
storm is the right fit here. Here is the scenario:

Client will send a webservice request, which we will receive it (using
apache CXF) and push it into JMS (probably kafka or RabbitMQ). From JMS
storm spout will receive it and sent it to the bolt. In bolt we will be
performing the validation that involves db calls, and once done we will
persist the data in no-sql db.

I understand the advantages of using storm, but my concern is that we are
not performing some complex bolt chaining and might be using one or two
bolts. I'm confused whether storm fits well in this case?

P.S. we are planning to deploy webservices on the application server in
cluster setup to support 10k TPS. Not sure if cluster setup is good
approach, but I'll look into it later.


Re: storm use case in sensor data

2014-09-05 Thread padma priya chitturi
If the data is emitted in terms of streams continuously from a sensor then
Storm would be the ideal framework to process the data.
Storm infact lets you process streams of data in real time.

To get brief overview go through documentation over storm.
https://storm.incubator.apache.org/


On Thu, Sep 4, 2014 at 7:50 PM, Yuheng Du  wrote:

> Hi guys,
>
> Does anyone use Storm to deal with sensor network data? I need some use
> cases or research project ideas of Storm or other big data tools in sensor
> network field. Can I get a sense of what the advantage of adopting Storm
> platform?
>
> Best,
>
> Yuheng
>
>


storm use case in sensor data

2014-09-04 Thread Yuheng Du
Hi guys,

Does anyone use Storm to deal with sensor network data? I need some use
cases or research project ideas of Storm or other big data tools in sensor
network field. Can I get a sense of what the advantage of adopting Storm
platform?

Best,

Yuheng


Re: storm use case questions

2014-09-03 Thread Yuheng Du
Hi all,

Thanks. Does anyone use Storm to deal with sensor network data? I need some
use cases or research project ideas of Storm or other big data tools in
sensor network field. Can I get a sense of what the advantage of adopting
Storm platform?

Best,


On Wed, Sep 3, 2014 at 4:51 AM, Tian Guo  wrote:

> Hi, All
>
> Regarding the average and standard deviation of a stream from a specific
> sensor, these two variables can be computed incrementally and take
> constant time to update. So, I do not see the burden even if the
> implementation is trivial. And the distributed stream processing looks like
> redundant for only hundreds of streams.
>
> Storm is a cluster based distributed data processing rather than
> a decentralized system like sensor network. Whether it is applicable for
> your scenario depends on where you deploy it inside your architecture.
>
> Best,
>
>
> 2014-09-03 8:59 GMT+02:00 Vikas Agarwal :
>
> Hi Yuheng,
>>
>> We are also exploring/implementing for analyzing stream of messages
>> (twitter stream and other sources). With my short experience, one thing I
>> came know is that a lot would depend on the parallelism of the spouts in
>> your topology, so you can parallelize the ingestion of data using
>> partitioning or similar stuff, you can benefit from storm definitely
>> otherwise you would see lot of failed messages which may accumulate a large
>> backlog of such overflowing input data.
>>
>>
>> On Wed, Sep 3, 2014 at 1:01 AM, Yuheng Du 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I have a stream of sensor data coming from rabbitmq. For each sensor
>>> message, it is of the JSON format and have the following fields:
>>>
>>> deviceId: "BOT-N3"
>>> reading0: 2.25
>>> reading1: 3.78
>>> 
>>> readingN: -1.35
>>>
>>> each float number of readingN represents a sensor reading on a specific
>>> field location.
>>>
>>> Now for each incoming message, I want to do a query which gives me the
>>> average and standard deviation of a certain 'deviceId' 's 'readingN' over a
>>> custom time range (a year ago to now, a month ago to now, etc). So if N=28,
>>> for each incoming message I will need to do 28 queries on the historic data
>>> at almost the same time. I need the query results to be returned in near
>>> real time so the other incoming messages won't get blocked.
>>>
>>> Is STORM a good solution to this issue?
>>>
>>> I have tried Elasticsearch-Logstash-Kibana stack already, It seems that
>>> when the incoming message rates are high, The messages will be blocked
>>> since the ES server can't correspond to hundreds of query requesst at
>>> the same time.
>>>
>>> Will STORM help me in this case? What is the common use case of STORM in
>>> processing real-time sensor data (coming from sensor network specifically)?
>>>
>>>  Thanks!
>>>
>>> best
>>>
>>> Yuheng
>>>
>>
>>
>>
>> --
>> Regards,
>> Vikas Agarwal
>> 91 – 9928301411
>>
>> InfoObjects, Inc.
>> Execution Matters
>> http://www.infoobjects.com
>> 2041 Mission College Boulevard, #280
>> Santa Clara, CA 95054
>> +1 (408) 988-2000 Work
>> +1 (408) 716-2726 Fax
>>
>>
>


Re: storm use case questions

2014-09-03 Thread Tian Guo
Hi, All

Regarding the average and standard deviation of a stream from a specific
sensor, these two variables can be computed incrementally and take
constant time to update. So, I do not see the burden even if the
implementation is trivial. And the distributed stream processing looks like
redundant for only hundreds of streams.

Storm is a cluster based distributed data processing rather than
a decentralized system like sensor network. Whether it is applicable for
your scenario depends on where you deploy it inside your architecture.

Best,


2014-09-03 8:59 GMT+02:00 Vikas Agarwal :

> Hi Yuheng,
>
> We are also exploring/implementing for analyzing stream of messages
> (twitter stream and other sources). With my short experience, one thing I
> came know is that a lot would depend on the parallelism of the spouts in
> your topology, so you can parallelize the ingestion of data using
> partitioning or similar stuff, you can benefit from storm definitely
> otherwise you would see lot of failed messages which may accumulate a large
> backlog of such overflowing input data.
>
>
> On Wed, Sep 3, 2014 at 1:01 AM, Yuheng Du 
> wrote:
>
>> Hi guys,
>>
>> I have a stream of sensor data coming from rabbitmq. For each sensor
>> message, it is of the JSON format and have the following fields:
>>
>> deviceId: "BOT-N3"
>> reading0: 2.25
>> reading1: 3.78
>> 
>> readingN: -1.35
>>
>> each float number of readingN represents a sensor reading on a specific
>> field location.
>>
>> Now for each incoming message, I want to do a query which gives me the
>> average and standard deviation of a certain 'deviceId' 's 'readingN' over a
>> custom time range (a year ago to now, a month ago to now, etc). So if N=28,
>> for each incoming message I will need to do 28 queries on the historic data
>> at almost the same time. I need the query results to be returned in near
>> real time so the other incoming messages won't get blocked.
>>
>> Is STORM a good solution to this issue?
>>
>> I have tried Elasticsearch-Logstash-Kibana stack already, It seems that
>> when the incoming message rates are high, The messages will be blocked
>> since the ES server can't correspond to hundreds of query requesst at
>> the same time.
>>
>> Will STORM help me in this case? What is the common use case of STORM in
>> processing real-time sensor data (coming from sensor network specifically)?
>>
>>  Thanks!
>>
>> best
>>
>> Yuheng
>>
>
>
>
> --
> Regards,
> Vikas Agarwal
> 91 – 9928301411
>
> InfoObjects, Inc.
> Execution Matters
> http://www.infoobjects.com
> 2041 Mission College Boulevard, #280
> Santa Clara, CA 95054
> +1 (408) 988-2000 Work
> +1 (408) 716-2726 Fax
>
>


Re: storm use case questions

2014-09-03 Thread Vikas Agarwal
Hi Yuheng,

We are also exploring/implementing for analyzing stream of messages
(twitter stream and other sources). With my short experience, one thing I
came know is that a lot would depend on the parallelism of the spouts in
your topology, so you can parallelize the ingestion of data using
partitioning or similar stuff, you can benefit from storm definitely
otherwise you would see lot of failed messages which may accumulate a large
backlog of such overflowing input data.


On Wed, Sep 3, 2014 at 1:01 AM, Yuheng Du  wrote:

> Hi guys,
>
> I have a stream of sensor data coming from rabbitmq. For each sensor
> message, it is of the JSON format and have the following fields:
>
> deviceId: "BOT-N3"
> reading0: 2.25
> reading1: 3.78
> 
> readingN: -1.35
>
> each float number of readingN represents a sensor reading on a specific
> field location.
>
> Now for each incoming message, I want to do a query which gives me the
> average and standard deviation of a certain 'deviceId' 's 'readingN' over a
> custom time range (a year ago to now, a month ago to now, etc). So if N=28,
> for each incoming message I will need to do 28 queries on the historic data
> at almost the same time. I need the query results to be returned in near
> real time so the other incoming messages won't get blocked.
>
> Is STORM a good solution to this issue?
>
> I have tried Elasticsearch-Logstash-Kibana stack already, It seems that
> when the incoming message rates are high, The messages will be blocked
> since the ES server can't correspond to hundreds of query requesst at the
> same time.
>
> Will STORM help me in this case? What is the common use case of STORM in
> processing real-time sensor data (coming from sensor network specifically)?
>
>  Thanks!
>
> best
>
> Yuheng
>



-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax


storm use case questions

2014-09-02 Thread Yuheng Du
Hi guys,

I have a stream of sensor data coming from rabbitmq. For each sensor
message, it is of the JSON format and have the following fields:

deviceId: "BOT-N3"
reading0: 2.25
reading1: 3.78

readingN: -1.35

each float number of readingN represents a sensor reading on a specific
field location.

Now for each incoming message, I want to do a query which gives me the
average and standard deviation of a certain 'deviceId' 's 'readingN' over a
custom time range (a year ago to now, a month ago to now, etc). So if N=28,
for each incoming message I will need to do 28 queries on the historic data
at almost the same time. I need the query results to be returned in near
real time so the other incoming messages won't get blocked.

Is STORM a good solution to this issue?

I have tried Elasticsearch-Logstash-Kibana stack already, It seems that
when the incoming message rates are high, The messages will be blocked
since the ES server can't correspond to hundreds of query requesst at the
same time.

Will STORM help me in this case? What is the common use case of STORM in
processing real-time sensor data (coming from sensor network specifically)?

 Thanks!

best

Yuheng