Storm-kafka Integration

2014-06-02 Thread Komal Thombare
Hi, I am trying to integrate storm with kafka, but did not get any proper documentation to do so. Can anyone please help me for getting started with the integration. Thanks and Regards, Komal Thombare =-=-= Notice: The information contained in this e-mail message and/or

Re: Storm-kafka Integration

2014-06-02 Thread Andres Gomez
Do you use trindent or only storm??? Regards, Andres El 02/06/2014, a las 10:43, Komal Thombare komal.thomb...@tcs.com escribió: Hi, I am trying to integrate storm with kafka, but did not get any proper documentation to do so. Can anyone please help me for getting started with the

Re: Storm-kafka Integration

2014-06-02 Thread Komal Thombare
Only Storm Thanks and Regards, Komal Thombare Tata Consultancy Services Limited Ph:- 086-55388772 Mail-to: komal.thomb...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting

Re: Storm-kafka Integration

2014-06-02 Thread Deepak Sharma
Hi Komal Have you looked at KafkaSpout? Thanks Deepak On Mon, Jun 2, 2014 at 2:13 PM, Komal Thombare komal.thomb...@tcs.com wrote: Hi, I am trying to integrate storm with kafka, but did not get any proper documentation to do so. Can anyone please help me for getting started with the

Re: Storm-kafka Integration

2014-06-02 Thread Komal Thombare
Hi Deepak, Yes i have. I have also got the storm-contrib source code, but then I am unaware of how to compile it. Thanks and Regards, Komal Thombare Tata Consultancy Services Limited Ph:- 086-55388772 Mail-to: komal.thomb...@tcs.com Website: http://www.tcs.com

Re: Storm-kafka Integration

2014-06-02 Thread Joe Stein
Please take a look at http://www.michael-noll.com/blog/2014/05/27/kafka-storm-integration-example-tutorial/#state-of-the-integration-game There is a github project too https://github.com/miguno/kafka-storm-starter This covers latest Storm, Kafka and Avro.

Re: Storm-kafka Integration

2014-06-02 Thread Andres Gomez
You should make a clone of this project: https://github.com/apache/incubator-storm/tree/master/external/storm-kafka and do “mvn install”, I suposse you use kafka 0.8.+ and then you do this: SpoutConfig spoutConfig = new SpoutConfig(); spoutConfig.zkServers = “localhost”; //zookeeper Host

Worker dies (bolt)

2014-06-02 Thread Margusja
Hi I am using apache-storm-0.9.1-incubating. I have simple topology: Spout reads from kafka topic and Bolt writes lines from spout to HBase. recently we did a test - we send 300 000 000 messages over kafka-rest - kafka-queue - storm topology - hbase. I noticed that around one hour and

Re: Storm-kafka Integration

2014-06-02 Thread Anis Nasir
Dear Komal, I used the storm-kafka version mentioned in the link. http://anisnasir.wordpress.com/2014/05/25/apache-storm-kafka-a-nice-choice/ Regards Anis On Mon, Jun 2, 2014 at 1:49 PM, Komal Thombare komal.thomb...@tcs.com wrote: Hi Andres, Still i am getting the same error. Thanks

Strange latency behaviour using 1 core (in a multiple-core processor)

2014-06-02 Thread dzacu1a
Hi, We are doing a benchmark test and limiting the numbers of core used by Storm. The topology contains 1 spout and 6 bolts. The first bolt has the highest load. Our experiment : we run the topology on one single machine with 4 cores. We start with 1 core and then increasing to 2, 3 and 4

Re: Is there any way for my application code to get notified after it gets deserialized on a worker node and before spouts/bolts are opened/prepared ?

2014-06-02 Thread Marc Vaillant
The bolt base classes have a prepare method: https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseBasicBolt.html and the spout base classes have a similar activate method: https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseRichSpout.html Is that

storm-kafka external project

2014-06-02 Thread Haralds Ulmanis
First , there is small typo kind of error in: https://github.com/apache/incubator-storm/blob/master/external/storm-kafka/src/jvm/storm/kafka/PartitionManager.java line 217: if (lastCompletedOffset != lastCompletedOffset) { i guess there should be something like if (_committedTo !=

Re: Optimizing Kafka Stream

2014-06-02 Thread Raphael Hsieh
Thanks for the tips Chi, I'm a little confused about the partitioning. I had thought that the number of partitions was determined by the amount of parallelism in the topology. For example if I said .parallelismHint(4), then I would have 4 different partitions. Is this not the case ? Is there a set

Writing Bolts in Python

2014-06-02 Thread Ashu Goel
Hi all, I am experimenting with writing bolts in Python and was wondering how the relationship between the Java and Python code works. For example, I have a Python bolt that looks like this: class ScanCountBolt(storm.BasicBolt): def __init__(self): #super(ScanCountBolt,

Re: Storm with RDBMS

2014-06-02 Thread alex kamil
for parallel reads of massive historical data and high volume writes you could you a distributed db with SQL layer such as Apache Hbase+Phoenix http://phoenix.incubator.apache.org/, I think it might complement Storm nicely On Mon, Jun 2, 2014 at 10:19 AM, Nathan Leung ncle...@gmail.com wrote:

Re: Optimizing Kafka Stream

2014-06-02 Thread Chi Hoang
Raphael, The number of partitions is defined in your Kafka configuration - http://kafka.apache.org/documentation.html#brokerconfigs (num.partitions) - or when you create the topic. The behavior is different for each version of Kafka, so you should read more documentation. Your topology needs to

Re: Optimizing Kafka Stream

2014-06-02 Thread Raphael Hsieh
Oh ok. Thanks Chi! Do you have any ideas about why my batch size never seems to get any bigger than 83K tuples ? Currently I'm just using a barebones topology that looks like this: Stream spout = topology.newStream(..., ...) .parallelismHint() .groupBy(new Fields(time)) .aggregate(new

Re: Is there any way for my application code to get notified after it gets deserialized on a worker node and before spouts/bolts are opened/prepared ?

2014-06-02 Thread Chris Bedford
Yes.. if i used prepare or open on spouts or bolts it would work, but unfortunately it would be a bit brittle. I'd have to include a spout or bolt just for initializing my invariant code... i'd rather do that when the topology is activated on the worker.. so this seems like a good use of an

Re: Is there any way for my application code to get notified after it gets deserialized on a worker node and before spouts/bolts are opened/prepared ?

2014-06-02 Thread Michael Rose
You don't have to include a specific bolt for init code. It's not difficult to push your init code into a separate class and call it from your bolts, lock on that class, run init, and then allow other instances to skip over it. Without changing bolt/spout code, I've taken to including a task hook

Re: Writing Bolts in Python

2014-06-02 Thread Andrew Montalenti
The ShellBolt looks for scancount.py in the resources/ directory in your JAR, which will be extracted to each worker machine. It then simply invokes python scancount.py in that directory. So you need to make sure the scancount.py file will be on the classpath under resources/, as well the storm.py

Re: Is there any way for my application code to get notified after it gets deserialized on a worker node and before spouts/bolts are opened/prepared ?

2014-06-02 Thread Chris Bedford
This looks promising. thanks. hope you don't mind one more question -- if i create my own implementation of ITaskHook and add it do the config as you illustrated in prev. msg..will the prepare() method of my implementation be called exactly once shortly after StormTopology is deserialized

Re: Is there any way for my application code to get notified after it gets deserialized on a worker node and before spouts/bolts are opened/prepared ?

2014-06-02 Thread Michael Rose
No, it will be called per bolt instance. That's why init code needs to be guarded behind a double-check lock to guarantee it only executes once per JVM. e.g. private static volatile boolean initialized = false; ... if (!initialized) { synchronized(MyInitCode.class) { if

Re: Topology acked/emitted count reset

2014-06-02 Thread Harsha
Hi Andrew, From what I read in the code executor.clj (worker) is responsible for updating the stats for bolts and spouts . If a worker is restarted or it might be the case if a topology is rebalanced there is a chance of loosing the stats. Topology stats derived from spouts and

Custom metrics using IMetrics interface

2014-06-02 Thread Xueming Li
Hi all, I am working on a project to build an order processing pipeline on top of Storm. In order to measure performance, for every spout/bolt and every order processed by them, one requirement is to generate custom metrics in the form order id, order entry timestamp in milisec, order exit