Hi,
I am trying to integrate storm with kafka, but did not get any proper
documentation to do so.
Can anyone please help me for getting started with the integration.
Thanks and Regards,
Komal Thombare
=-=-=
Notice: The information contained in this e-mail
message and/or
Do you use trindent or only storm???
Regards,
Andres
El 02/06/2014, a las 10:43, Komal Thombare komal.thomb...@tcs.com escribió:
Hi,
I am trying to integrate storm with kafka, but did not get any proper
documentation to do so.
Can anyone please help me for getting started with the
Only Storm
Thanks and Regards,
Komal Thombare
Tata Consultancy Services Limited
Ph:- 086-55388772
Mail-to: komal.thomb...@tcs.com
Website: http://www.tcs.com
Experience certainty. IT Services
Business Solutions
Consulting
Hi Komal
Have you looked at KafkaSpout?
Thanks
Deepak
On Mon, Jun 2, 2014 at 2:13 PM, Komal Thombare komal.thomb...@tcs.com
wrote:
Hi,
I am trying to integrate storm with kafka, but did not get any proper
documentation to do so.
Can anyone please help me for getting started with the
Hi Deepak,
Yes i have. I have also got the storm-contrib source code, but then I am
unaware of how to compile it.
Thanks and Regards,
Komal Thombare
Tata Consultancy Services Limited
Ph:- 086-55388772
Mail-to: komal.thomb...@tcs.com
Website: http://www.tcs.com
Please take a look at
http://www.michael-noll.com/blog/2014/05/27/kafka-storm-integration-example-tutorial/#state-of-the-integration-game
There is a github project too https://github.com/miguno/kafka-storm-starter
This covers latest Storm, Kafka and Avro.
You should make a clone of this project:
https://github.com/apache/incubator-storm/tree/master/external/storm-kafka
and do “mvn install”, I suposse you use kafka 0.8.+
and then you do this:
SpoutConfig spoutConfig = new SpoutConfig();
spoutConfig.zkServers = “localhost”; //zookeeper Host
Hi
I am using apache-storm-0.9.1-incubating.
I have simple topology: Spout reads from kafka topic and Bolt writes
lines from spout to HBase.
recently we did a test - we send 300 000 000 messages over kafka-rest -
kafka-queue - storm topology - hbase.
I noticed that around one hour and
Dear Komal,
I used the storm-kafka version mentioned in the link.
http://anisnasir.wordpress.com/2014/05/25/apache-storm-kafka-a-nice-choice/
Regards
Anis
On Mon, Jun 2, 2014 at 1:49 PM, Komal Thombare komal.thomb...@tcs.com
wrote:
Hi Andres,
Still i am getting the same error.
Thanks
Hi,
We are doing a benchmark test and limiting the numbers of core used by Storm.
The topology contains 1 spout and 6 bolts. The first bolt has the highest load.
Our experiment : we run the topology on one single machine with 4 cores. We
start with 1 core and then increasing to 2, 3 and 4
The bolt base classes have a prepare method:
https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseBasicBolt.html
and the spout base classes have a similar activate method:
https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseRichSpout.html
Is that
First , there is small typo kind of error in:
https://github.com/apache/incubator-storm/blob/master/external/storm-kafka/src/jvm/storm/kafka/PartitionManager.java
line 217:
if (lastCompletedOffset != lastCompletedOffset) {
i guess there should be something like if (_committedTo !=
Thanks for the tips Chi,
I'm a little confused about the partitioning. I had thought that the number
of partitions was determined by the amount of parallelism in the topology.
For example if I said .parallelismHint(4), then I would have 4 different
partitions. Is this not the case ?
Is there a set
Hi all,
I am experimenting with writing bolts in Python and was wondering how the
relationship between the Java and Python code works. For example, I have a
Python bolt that looks like this:
class ScanCountBolt(storm.BasicBolt):
def __init__(self):
#super(ScanCountBolt,
for parallel reads of massive historical data and high volume writes you
could you a distributed db with SQL layer such as Apache Hbase+Phoenix
http://phoenix.incubator.apache.org/, I think it might complement Storm
nicely
On Mon, Jun 2, 2014 at 10:19 AM, Nathan Leung ncle...@gmail.com wrote:
Raphael,
The number of partitions is defined in your Kafka configuration -
http://kafka.apache.org/documentation.html#brokerconfigs (num.partitions) -
or when you create the topic. The behavior is different for each version
of Kafka, so you should read more documentation. Your topology needs to
Oh ok. Thanks Chi!
Do you have any ideas about why my batch size never seems to get any bigger
than 83K tuples ?
Currently I'm just using a barebones topology that looks like this:
Stream spout = topology.newStream(..., ...)
.parallelismHint()
.groupBy(new Fields(time))
.aggregate(new
Yes.. if i used prepare or open on spouts or bolts it would work, but
unfortunately it would be a bit brittle. I'd have to include a spout or
bolt just for initializing my invariant code... i'd rather do that when the
topology is activated on the worker.. so this seems like a good use of an
You don't have to include a specific bolt for init code. It's not difficult
to push your init code into a separate class and call it from your bolts,
lock on that class, run init, and then allow other instances to skip over
it.
Without changing bolt/spout code, I've taken to including a task hook
The ShellBolt looks for scancount.py in the resources/ directory in your
JAR, which will be extracted to each worker machine. It then simply invokes
python scancount.py in that directory. So you need to make sure the
scancount.py file will be on the classpath under resources/, as well the
storm.py
This looks promising. thanks.
hope you don't mind one more question -- if i create my own
implementation of ITaskHook and add it do the config as you illustrated in
prev. msg..will the prepare() method of my implementation be called
exactly once shortly after StormTopology is deserialized
No, it will be called per bolt instance. That's why init code needs to be
guarded behind a double-check lock to guarantee it only executes once per
JVM.
e.g.
private static volatile boolean initialized = false;
...
if (!initialized) {
synchronized(MyInitCode.class) {
if
Hi Andrew,
From what I read in the code executor.clj (worker) is
responsible for updating the stats for bolts and spouts . If a worker
is restarted or it might be the case if a topology is rebalanced there
is a chance of loosing the stats. Topology stats derived from spouts
and
Hi all,
I am working on a project to build an order processing pipeline on top of
Storm. In order to measure performance, for every spout/bolt and every
order processed by them, one requirement is to generate custom metrics in
the form order id, order entry timestamp in milisec, order exit
24 matches
Mail list logo