date:20140606

supervisor start error

2014-06-06 Thread zongxuqin

I installed a storm on a single machine.when i start the supervisor , it 
generate such error


org.apache.thrift7.transport.TTransportException: Could not create 
ServerSocket on address 0.0.0.0/0.0.0.0:6627.
at 
org.apache.thrift7.transport.TNonblockingServerSocket.init(TNonblockingServerSocket.java:89) 
~[libthrift7-0.7.0-2.jar:0.7.0-2]
at 
org.apache.thrift7.transport.TNonblockingServerSocket.init(TNonblockingServerSocket.java:68) 
~[libthrift7-0.7.0-2.jar:0.7.0-2]
at 
org.apache.thrift7.transport.TNonblockingServerSocket.init(TNonblockingServerSocket.java:61) 
~[libthrift7-0.7.0-2.jar:0.7.0-2]
at 
backtype.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:1137) 
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.daemon.nimbus$_launch.invoke(nimbus.clj:1167) 
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.daemon.nimbus$_main.invoke(nimbus.clj:1189) 
~[storm-core-0.9.0.1.jar:na]

at clojure.lang.AFn.applyToHelper(AFn.java:159) ~[clojure-1.4.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na]
at backtype.storm.daemon.nimbus.main(Unknown Source) 
~[storm-core-0.9.0.1.jar:na]



I do not konw what cause this

Re: Tuples lost in Storm 0.9.1

2014-06-06 Thread 李家宏

Hi, Daria

The lost tuples may going into two places:
1) message_queue in netty-client, which will cause memory leak;
2) netty internal buffer, if connection lose, all tuples in it get lost.

so, check your worker log to see if there is any connection lost error

Regards


2014-05-16 12:18 GMT+08:00 李家宏 jh.li...@gmail.com:

 I am running into the same issue. Where do the lost tuples gone ? If they
 were queueing in the transport layer, the memory usage should keep
 increasing, but I didn't see any noticeable memory leaks.

 Does storm have the guarantee all tuples sent from task A to task B will
 be received by task B ? Moreover, are they in order ?

 Can anybody give any idea on this issue


 2014-04-02 20:56 GMT+08:00 Daria Mayorova d.mayor...@gmail.com:

  Hi everyone,

 We are having some issues with the Storm topology. The problem is that
 some tuples are being lost somewhere in the topology. Just after the
 topology is deployed, it goes pretty well, but after several hours it
 starts to loose a significant amount of tuples.

 From what we've found out from the logs, the thing is that the tuples
 exit one bolt/spout, and never enter the next bolt.

 Here is some info about the topology:

- The version is 0.9.1, and netty is used as transport
- The spout is extending BaseRichSpout, and the bolts extend
BaseBasicBolt
- The spout is using Kestrel message queue
- The cluster consists of 2 nodes: zookeeper, nimbus and ui are
running on one node, and the workers run on another node. I am attaching
the content of the config files below. We have also tried running the
workers on another node (the same where nimbus and zookeeper are), and 
 also
on both nodes, but the behavior is the same.

 According to the Storm UI there are no Failed tuples. Can anybody give
 any idea of what might be the reason of the tuples getting lost?

 Thanks.

 *Storm config (storm.yaml)*
 (In case both nodes have workers running, the configuration is the same
 on both nodes, just the storm.local.hostname parameter changes)

 storm.zookeeper.servers:
  - zkserver1
 nimbus.host: nimbusserver
 storm.local.dir: /mnt/storm
 supervisor.slots.ports:
 - 6700
 - 6701
 - 6702
 - 6703
 storm.local.hostname: storm1server

 nimbus.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true
 ui.childopts: -Xmx768m -Djava.net.preferIPv4Stack=true
 supervisor.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true
 worker.childopts: -Xmx3548m -Djava.net.preferIPv4Stack=true

 storm.cluster.mode: distributed
 storm.local.mode.zmq: false
 storm.thrift.transport:
 backtype.storm.security.auth.SimpleTransportPlugin

 storm.messaging.transport: backtype.storm.messaging.netty.Context

 storm.messaging.netty.server_worker_threads: 1
 storm.messaging.netty.client_worker_threads: 1
 storm.messaging.netty.buffer_size: 5242880 #5MB buffer
 storm.messaging.netty.max_retries: 30
 storm.messaging.netty.max_wait_ms: 1000
 storm.messaging.netty.min_wait_ms: 100

 *Zookeeper config (zoo.cfg):*
 tickTime=2000
 initLimit=10
 syncLimit=5
 dataDir=/var/zookeeper
 clientPort=2181
 autopurge.purgeInterval=24
 autopurge.snapRetainCount=5
 server.1=localhost:2888:3888

 *Topology configuration* passed to the StormSubmitter:
 Config conf = new Config();
 conf.setNumAckers(6);
 conf.setNumWorkers(4);
 conf.setMaxSpoutPending(100);


 Best regards,
 Daria Mayorova




 --

 ==

 Gvain

 Email: jh.li...@gmail.com




-- 

==

Gvain

Email: jh.li...@gmail.com

Implications of running multiple topologies without isolation

2014-06-06 Thread Justin Workman

I am trying to understand the implications of running multiple topologies
on a single cluster without using the isolation scheduler. The way this
appears to work, is isolation at the machine level and not the worker
level.

Our issue right now is that we only have 5 machines to work with. We have
enough resources to run multiple workers per machine, but do not feel
comfortable running each topology on fewer than all 5 machines.

So the main questions are, 1) what issues do I risk running into if I run
multiple topologies on a single cluster with out the isolation scheduler,
or 2) is there a way to isolate at the worker level, ie; each worker
handles tasks for a single topology?

Thanks
Justin

Re: Implications of running multiple topologies without isolation

2014-06-06 Thread Nathan Leung

1) a worker can spawn any number of threads, so you can possibly run into
standard shared resources issues (CPU, network, disk, etc).  RAM is not as
big of a problem since each worker gets a fixed amount.

2) a worker is spawned for a particular topology; it only execute
spout/bolt tasks for the topology to which it is assigned.


On Fri, Jun 6, 2014 at 12:17 PM, Justin Workman justinjwork...@gmail.com
wrote:

 I am trying to understand the implications of running multiple topologies
 on a single cluster without using the isolation scheduler. The way this
 appears to work, is isolation at the machine level and not the worker
 level.

 Our issue right now is that we only have 5 machines to work with. We have
 enough resources to run multiple workers per machine, but do not feel
 comfortable running each topology on fewer than all 5 machines.

 So the main questions are, 1) what issues do I risk running into if I run
 multiple topologies on a single cluster with out the isolation scheduler,
 or 2) is there a way to isolate at the worker level, ie; each worker
 handles tasks for a single topology?

 Thanks
 Justin

Re: Implications of running multiple topologies without isolation

2014-06-06 Thread Lin Zhao

Try storm on mesos for isolation.

http://mesosphere.io/learn/run-storm-on-mesos/


On Fri, Jun 6, 2014 at 10:03 AM, Justin Workman justinjwork...@gmail.com
wrote:

 Thanks for the responses. I assumed worker isolation worked this way.
 I had just read a couple things that made me question this.

 Justin

 Sent from my iPhone

 On Jun 6, 2014, at 10:25 AM, Derek Dagit der...@yahoo-inc.com wrote:

  So the main questions are, 1) what issues do I risk running into if I
 run
  multiple topologies on a single cluster with out the isolation
 scheduler,
 
  Resource contention on shared boxes: CPU (#cores), network, disk (if
 applicable).
 
  Depends on what these topologies are doing: which resources they will
 use the most.
 
 
  2) is there a way to isolate at the worker level, ie; each worker
  handles tasks for a single topology?
 
  I thought this is the way it normally worked.  A single worker JVM would
 run on behalf of one topology, but could run tasks from multiple different
 components (bolt/spouts) defined in that topology.
 
  --
  Derek
 
  On 6/6/14, 11:17, Justin Workman wrote:
  I am trying to understand the implications of running multiple
 topologies
  on a single cluster without using the isolation scheduler. The way this
  appears to work, is isolation at the machine level and not the worker
  level.
 
  Our issue right now is that we only have 5 machines to work with. We
 have
  enough resources to run multiple workers per machine, but do not feel
  comfortable running each topology on fewer than all 5 machines.
 
  So the main questions are, 1) what issues do I risk running into if I
 run
  multiple topologies on a single cluster with out the isolation
 scheduler,
  or 2) is there a way to isolate at the worker level, ie; each worker
  handles tasks for a single topology?
 
  Thanks
  Justin
 




-- 
Lin Zhao

3101 Park Blvd, Palo Alto, CA 94306

Re: Order of Bolt definition, catching that subscribes from non-existent component [ ...]

2014-06-06 Thread Abhishek Bhattacharjee

I am sorry for the late reply.
Yes , you can't have a loop. You can have a chain though( which doesn't
close upon itself ! ).

Thanks :-)



On Wed, May 7, 2014 at 12:50 PM, shahab shahab.mok...@gmail.com wrote:

 Thanks Abhishek. But this also implies that we can not have a loop ( of
 message processing stages) using Storm, right?

 best,
 /Shahab


 On Mon, May 5, 2014 at 9:45 PM, Abhishek Bhattacharjee 
 abhishek.bhattacharje...@gmail.com wrote:

 I don't think what you are trying to do is achievable. Data in storm
 always move forward so you can't give it back to a bolt from which it
 originated. That is a bolt can subscribe from bolts which were created
 before it's creation. So, I think you can create another object of the A
 bolt say D and then assign the o/p of C to D.


 On Mon, May 5, 2014 at 8:11 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I am trying to define a topology as following:
 S : is a spout
 A,B,C : are bolts
 -- : means emitting message

 S  --A
  A  --B
 B --C
 C --A

 I am declaring the Spouts and Bolts in the above order in my java code ,
 first S, then A , B and finally C.

 I am using  globalGrouping(BoltName, StreamID) for collecting messages
 to be collected by each bolt,

 The problem is that I receive an error, while defining bolt A saying
 that subscribes from non-existent component [C] .

 I guess the error is happening because component C is not defined yet!
 but what could be the solution to this?

 best,
 /Shahab









 --
 *Abhishek Bhattacharjee*
 *Pune Institute of Computer Technology*





-- 
*Abhishek Bhattacharjee*
*Pune Institute of Computer Technology*

Re: Order of Bolt definition, catching that subscribes from non-existent component [ ...]

2014-06-06 Thread Michael Rose

You can have a loop on a different stream. It's not always the best thing
to do (deadlock possibilities from buffers) but we have a production
topology that has that kind of pattern. In our case, one bolt acts as a
coordinator for recursive search.

Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com


On Fri, Jun 6, 2014 at 2:28 PM, Abhishek Bhattacharjee 
abhishek.bhattacharje...@gmail.com wrote:

 I am sorry for the late reply.
 Yes , you can't have a loop. You can have a chain though( which doesn't
 close upon itself ! ).

 Thanks :-)



 On Wed, May 7, 2014 at 12:50 PM, shahab shahab.mok...@gmail.com wrote:

 Thanks Abhishek. But this also implies that we can not have a loop ( of
 message processing stages) using Storm, right?

 best,
 /Shahab


 On Mon, May 5, 2014 at 9:45 PM, Abhishek Bhattacharjee 
 abhishek.bhattacharje...@gmail.com wrote:

 I don't think what you are trying to do is achievable. Data in storm
 always move forward so you can't give it back to a bolt from which it
 originated. That is a bolt can subscribe from bolts which were created
 before it's creation. So, I think you can create another object of the A
 bolt say D and then assign the o/p of C to D.


 On Mon, May 5, 2014 at 8:11 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I am trying to define a topology as following:
 S : is a spout
 A,B,C : are bolts
 -- : means emitting message

 S  --A
  A  --B
 B --C
 C --A

 I am declaring the Spouts and Bolts in the above order in my java code
 , first S, then A , B and finally C.

 I am using  globalGrouping(BoltName, StreamID) for collecting
 messages to be collected by each bolt,

 The problem is that I receive an error, while defining bolt A saying
 that subscribes from non-existent component [C] .

 I guess the error is happening because component C is not defined yet!
 but what could be the solution to this?

 best,
 /Shahab









 --
 *Abhishek Bhattacharjee*
 *Pune Institute of Computer Technology*





 --
 *Abhishek Bhattacharjee*
 *Pune Institute of Computer Technology*

Time Partitioning of Tuples

2014-06-06 Thread Jonathan Poon

Hi Everyone,

I'm currently investigating different data processing tools for an
application I'm interested in.  I have many sensors that I collect data
from.  However, I would like to group the data from every sensor at
predefined time intervals and process it together.

Using Storm terminology, I would have each sensor send data to a spout.
The spouts would then send tuples to a specific bolt that will process all
of the data within a specific time partition.  Each spout will tag each
event with a time id and each bolt will process data after collecting all
of the data with the same time id tags.

Is this possible with Storm?

I appreciate your help!

Jonathan

Re: Time Partitioning of Tuples

2014-06-06 Thread Kyle Nusbaum

You could send a signal tuple from the spout when it knows it's sent the 
last tuple for a time period, or include a field in the tuple for 
indicating it's the last member.


I'm curious about why you want to do this, since the purpose of storm is 
to facilitate stream processing rather than the type of batch processing 
you're describing.


-- Kyle

On 06/06/2014 05:14 PM, Jonathan Poon wrote:

Hi Nathan,

The sensor data I have is naturally time sorted, since its just 
collecting data and emitting it to a spout. Is it possible for a bolt 
to know when all of the tuples with the same time tag have been 
collected and to start processing it together?  Or is it only possible 
for a bolt to process each tuple one at a time?


Thanks!



On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung ncle...@gmail.com 
mailto:ncle...@gmail.com wrote:


You can have your bolt subscribe to the spout using fields
grouping and use time tag as your key.

On Jun 6, 2014 6:01 PM, Jonathan Poon jkp...@ucdavis.edu
mailto:jkp...@ucdavis.edu wrote:

Hi Everyone,

I'm currently investigating different data processing tools
for an application I'm interested in.  I have many sensors
that I collect data from.  However, I would like to group the
data from every sensor at predefined time intervals and
process it together.

Using Storm terminology, I would have each sensor send data to
a spout.  The spouts would then send tuples to a specific bolt
that will process all of the data within a specific time
partition.  Each spout will tag each event with a time id and
each bolt will process data after collecting all of the data
with the same time id tags.

Is this possible with Storm?

I appreciate your help!

Jonathan

Re: Time Partitioning of Tuples

2014-06-06 Thread Jonathan Poon

Hi Kyle,

I'm looking for a real-time batch processing tool.  In my case, I'm looking
to make correlations between all of the sensors at each time interval.

I could use Hadoop (Map Reduce), but it requires I need to collect all of
the data before I can batch process each time partition of data from each
sensor.

Another tool I'm also looking at is Spark Streaming, which allows me to
collect data at different time intervals and processing that batch of data
using Map Reduce

However, Map Reduce seems inefficient because my sensor data is already
time sorted naturally.  In addition, I would like real-time data on the fly.

Seems like Storm might be a candidate for this application.  Please let me
know what you think...!  Thanks for your help!

Jonathan




On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum knusb...@yahoo-inc.com wrote:

  You could send a signal tuple from the spout when it knows it's sent the
 last tuple for a time period, or include a field in the tuple for
 indicating it's the last member.

 I'm curious about why you want to do this, since the purpose of storm is
 to facilitate stream processing rather than the type of batch processing
 you're describing.

 -- Kyle

 On 06/06/2014 05:14 PM, Jonathan Poon wrote:

  Hi Nathan,

  The sensor data I have is naturally time sorted, since its just
 collecting data and emitting it to a spout. Is it possible for a bolt to
 know when all of the tuples with the same time tag have been collected and
 to start processing it together?  Or is it only possible for a bolt to
 process each tuple one at a time?

  Thanks!



 On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung ncle...@gmail.com wrote:

 You can have your bolt subscribe to the spout using fields grouping and
 use time tag as your key.
  On Jun 6, 2014 6:01 PM, Jonathan Poon jkp...@ucdavis.edu wrote:

Hi Everyone,

  I'm currently investigating different data processing tools for an
 application I'm interested in.  I have many sensors that I collect data
 from.  However, I would like to group the data from every sensor at
 predefined time intervals and process it together.

  Using Storm terminology, I would have each sensor send data to a
 spout.  The spouts would then send tuples to a specific bolt that will
 process all of the data within a specific time partition.  Each spout will
 tag each event with a time id and each bolt will process data after
 collecting all of the data with the same time id tags.

  Is this possible with Storm?

  I appreciate your help!

  Jonathan

Re: Time Partitioning of Tuples

2014-06-06 Thread Kyle Nusbaum


Sounds interesting.

I don't know much about your project, so I won't speculate about your 
purposes.


One thing to consider is that the duration of the computation on a time 
slice must be longer than the time slice itself to really make this type 
of setup worthwhile. Otherwise you could just feed the batches through 
the same bolt, since it would be done processing a batch before the next 
one comes in.


-- Kyle

On 06/06/2014 05:40 PM, Jonathan Poon wrote:

Hi Kyle,

I'm looking for a real-time batch processing tool.  In my case, I'm 
looking to make correlations between all of the sensors at each time 
interval.


I could use Hadoop (Map Reduce), but it requires I need to collect all 
of the data before I can batch process each time partition of data 
from each sensor.


Another tool I'm also looking at is Spark Streaming, which allows me 
to collect data at different time intervals and processing that batch 
of data using Map Reduce


However, Map Reduce seems inefficient because my sensor data is 
already time sorted naturally.  In addition, I would like real-time 
data on the fly.


Seems like Storm might be a candidate for this application. Please let 
me know what you think...!  Thanks for your help!


Jonathan




On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum knusb...@yahoo-inc.com 
mailto:knusb...@yahoo-inc.com wrote:


You could send a signal tuple from the spout when it knows it's
sent the last tuple for a time period, or include a field in the
tuple for indicating it's the last member.

I'm curious about why you want to do this, since the purpose of
storm is to facilitate stream processing rather than the type of
batch processing you're describing.

-- Kyle

On 06/06/2014 05:14 PM, Jonathan Poon wrote:

Hi Nathan,

The sensor data I have is naturally time sorted, since its just
collecting data and emitting it to a spout. Is it possible for a
bolt to know when all of the tuples with the same time tag have
been collected and to start processing it together?  Or is it
only possible for a bolt to process each tuple one at a time?

Thanks!



On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung ncle...@gmail.com
mailto:ncle...@gmail.com wrote:

You can have your bolt subscribe to the spout using fields
grouping and use time tag as your key.

On Jun 6, 2014 6:01 PM, Jonathan Poon jkp...@ucdavis.edu
mailto:jkp...@ucdavis.edu wrote:

Hi Everyone,

I'm currently investigating different data processing
tools for an application I'm interested in.  I have many
sensors that I collect data from. However, I would like
to group the data from every sensor at predefined time
intervals and process it together.

Using Storm terminology, I would have each sensor send
data to a spout. The spouts would then send tuples to a
specific bolt that will process all of the data within a
specific time partition.  Each spout will tag each event
with a time id and each bolt will process data after
collecting all of the data with the same time id tags.

Is this possible with Storm?

I appreciate your help!

Jonathan

Re: Time Partitioning of Tuples

2014-06-06 Thread Jonathan Poon

I will take a look into Trident as well.  Thanks for the tip!


On Fri, Jun 6, 2014 at 3:53 PM, Kyle Nusbaum knusb...@yahoo-inc.com wrote:

  Sounds interesting.

 I don't know much about your project, so I won't speculate about your
 purposes.

 One thing to consider is that the duration of the computation on a time
 slice must be longer than the time slice itself to really make this type of
 setup worthwhile. Otherwise you could just feed the batches through the
 same bolt, since it would be done processing a batch before the next one
 comes in.

 -- Kyle

 On 06/06/2014 05:40 PM, Jonathan Poon wrote:

Hi Kyle,

  I'm looking for a real-time batch processing tool.  In my case, I'm
 looking to make correlations between all of the sensors at each time
 interval.

  I could use Hadoop (Map Reduce), but it requires I need to collect all of
 the data before I can batch process each time partition of data from each
 sensor.

  Another tool I'm also looking at is Spark Streaming, which allows me to
 collect data at different time intervals and processing that batch of data
 using Map Reduce

  However, Map Reduce seems inefficient because my sensor data is already
 time sorted naturally.  In addition, I would like real-time data on the fly.

  Seems like Storm might be a candidate for this application.  Please let
 me know what you think...!  Thanks for your help!

 Jonathan




 On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum knusb...@yahoo-inc.com
 wrote:

  You could send a signal tuple from the spout when it knows it's sent
 the last tuple for a time period, or include a field in the tuple for
 indicating it's the last member.

 I'm curious about why you want to do this, since the purpose of storm is
 to facilitate stream processing rather than the type of batch processing
 you're describing.

 -- Kyle

 On 06/06/2014 05:14 PM, Jonathan Poon wrote:

  Hi Nathan,

  The sensor data I have is naturally time sorted, since its just
 collecting data and emitting it to a spout. Is it possible for a bolt to
 know when all of the tuples with the same time tag have been collected and
 to start processing it together?  Or is it only possible for a bolt to
 process each tuple one at a time?

  Thanks!



 On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung ncle...@gmail.com wrote:

 You can have your bolt subscribe to the spout using fields grouping and
 use time tag as your key.
  On Jun 6, 2014 6:01 PM, Jonathan Poon jkp...@ucdavis.edu wrote:

Hi Everyone,

  I'm currently investigating different data processing tools for an
 application I'm interested in.  I have many sensors that I collect data
 from.  However, I would like to group the data from every sensor at
 predefined time intervals and process it together.

  Using Storm terminology, I would have each sensor send data to a
 spout.  The spouts would then send tuples to a specific bolt that will
 process all of the data within a specific time partition.  Each spout will
 tag each event with a time id and each bolt will process data after
 collecting all of the data with the same time id tags.

  Is this possible with Storm?

  I appreciate your help!

  Jonathan

RE: Time Partitioning of Tuples

2014-06-06 Thread Dan

You might look at Esper. I believe someone has even embedded Esper into Storm
-Dan

Date: Fri, 6 Jun 2014 15:40:08 -0700
Subject: Re: Time Partitioning of Tuples
From: jkp...@ucdavis.edu
To: user@storm.incubator.apache.org

Hi Kyle,

I'm looking for a real-time batch processing tool.  In my case, I'm looking to 
make correlations between all of the sensors at each time interval.

I could use Hadoop (Map Reduce), but it requires I need to collect all of the 
data before I can batch process each time partition of data from each sensor.

Another tool I'm also looking at is Spark Streaming, which allows me to collect 
data at different time intervals and processing that batch of data using Map 
Reduce

However, Map Reduce seems inefficient because my sensor data is already time 
sorted naturally.  In addition, I would like real-time data on the fly.

Seems like Storm might be a candidate for this application.  Please let me know 
what you think...!  Thanks for your help!

Jonathan

On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum knusb...@yahoo-inc.com wrote:

You could send a signal tuple from the
  spout when it knows it's sent the last tuple for a time period, or
  include a field in the tuple for indicating it's the last member.

  I'm curious about why you want to do this, since the purpose of
  storm is to facilitate stream processing rather than the type of
  batch processing you're describing.
  -- Kyle
  On 06/06/2014 05:14 PM, Jonathan Poon wrote:

  Hi Nathan,

  The sensor data I have is naturally time sorted, since its
  just collecting data and emitting it to a spout. Is it
  possible for a bolt to know when all of the tuples with the
  same time tag have been collected and to start processing it
  together?  Or is it only possible for a bolt to process each
  tuple one at a time?

Thanks!

On Fri, Jun 6, 2014 at 3:07 PM, Nathan
  Leung ncle...@gmail.com
  wrote:

You can have your bolt subscribe to the spout
  using fields grouping and use time tag as your key.

On Jun 6, 2014 6:01 PM,
  Jonathan Poon jkp...@ucdavis.edu
  wrote:

  Hi Everyone,

  I'm currently investigating different data
  processing tools for an application I'm
  interested in.  I have many sensors that I
  collect data from.  However, I would like
  to group the data from every sensor at
  predefined time intervals and process it
  together.  

Using Storm terminology, I would have each
sensor send data to a spout.  The spouts
would then send tuples to a specific bolt
that will process all of the data within a
specific time partition.  Each spout will
tag each event with a time id and each bolt
will process data after collecting all of
the data with the same time id tags.

  Is this possible with Storm?

I appreciate your help!

  Jonathan

supervisor start error

Re: Tuples lost in Storm 0.9.1

Implications of running multiple topologies without isolation

Re: Implications of running multiple topologies without isolation

Re: Implications of running multiple topologies without isolation

Re: Order of Bolt definition, catching that subscribes from non-existent component [ ...]

Re: Order of Bolt definition, catching that subscribes from non-existent component [ ...]

Time Partitioning of Tuples

Re: Time Partitioning of Tuples

Re: Time Partitioning of Tuples

Re: Time Partitioning of Tuples

Re: Time Partitioning of Tuples

RE: Time Partitioning of Tuples

13 matches

Site Navigation

Mail list logo

Footer information