Is kafka suitable for our architecture?

2014-10-09 Thread Albert Vila
Hi I just came across Kafta when I was trying to find solutions to scale our current architecture. We are currently downloading and processing 6M documents per day from online and social media. We have a different workflow for each type of document, but some of the steps are keyword extraction,

Re: Is kafka suitable for our architecture?

2014-10-09 Thread William Briggs
Manually managing data locality will become difficult to scale. Kafka is one potential tool you can use to help scale, but by itself, it will not solve your problem. If you need the data in near-real time, you could use a technology like Spark or Storm to stream data from Kafka and perform your

Re: how to identify rogue consumer

2014-10-09 Thread Jun Rao
Yes. Thanks, Jun On Wed, Oct 8, 2014 at 10:53 PM, Steven Wu stevenz...@gmail.com wrote: Jun, you mean trace level logging for requestAppender? log4j.logger.kafka.network.Processor=TRACE, requestAppender if it happens again, I can try to enable it. On Wed, Oct 8, 2014 at 9:54 PM, Jun Rao

Re: Is kafka suitable for our architecture?

2014-10-09 Thread Christian Csar
Apart from your data locality problem it sounds like what you want is a workqueue. Kafka's consumer structure doesn't lend itself too well to that use case as a single partition of a topic should only have one consumer instance per logical subscriber of the topic, and that consumer would not be

Re: Reassigning Partition Failing

2014-10-09 Thread Lung, Paul
Actually, reassigning the replica does work, even if the broker the partition resides on is dead. My problem was that there were some unknown issue with the leader. When I restarted the leader broker, it worked. Paul On 10/6/14, 11:41 AM, Joe Stein joe.st...@stealth.ly wrote: Agreed, I think it

Re: Reassigning Partition Failing

2014-10-09 Thread Lung, Paul
Hi Joe, I simply restarted the leader broker, and things seem to work again. Thank you. Best, Paul Lung On 10/2/14, 1:26 AM, Joe Stein joe.st...@stealth.ly wrote: What version of zookeeper are you running? First check to see if there is a znode for the /admin/reassign_partitions in zookeeper.

Re: MBeans, dashes, underscores, and KAFKA-1481

2014-10-09 Thread Neha Narkhede
I am going to vote for 1482 to be included in 0.8.2, if we have a patch submitted in a week. I think we've had this JIRA opened for too long and we held people back so it's only fair to release this. On Wed, Oct 8, 2014 at 9:40 PM, Jun Rao jun...@gmail.com wrote: Otis, Just have the patch

create topic in multiple node kafka cluster

2014-10-09 Thread Sa Li
Hi, All I setup a 3-node kafka cluster on top of 3-node zk ensemble. Now I launch 1 broker on each node, the brokers will be randomly distributed to zk ensemble, see DO-mq-dev.1 [zk: localhost:2181(CONNECTED) 1] ls /brokers/ids [0, 1] pof-kstorm-dev1.2 [zk: localhost:2181(CONNECTED) 1] ls

Re: create topic in multiple node kafka cluster

2014-10-09 Thread Joel Koshy
It looks like You set up three separate ZK clusters, not an ensemble. You can take a look at http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkMulitServerSetup on how to set up an ensemble; and then register all three kafka brokers on that single zk ensemble. Joel On Thu, Oct 09,

Re: refactoring ZK so it is plugable, would this make sense?

2014-10-09 Thread Jun Rao
This may not be easy since you have to implement things like watcher callbacks. What's your main concern with the ZK dependency? Thanks, Jun On Thu, Oct 9, 2014 at 8:20 AM, S Ahmed sahmed1...@gmail.com wrote: Hi, I was wondering if the zookeeper library (zkutils.scala etc) was designed in

Re: create topic in multiple node kafka cluster

2014-10-09 Thread Guozhang Wang
Sa, Usually you would not want to set up kafka brokers at the same machines with zk nodes, as that will add depending failures to the server cluster. Back to your original question, it seems your zk nodes do not form an ensemble, since otherwise their zk data should be the same. Guozhang On

Re: create topic in multiple node kafka cluster

2014-10-09 Thread Sa Li
Hi, I kinda doubt whether I make it as an ensemble, since it shows root@DO-mq-dev:/etc/zookeeper/conf# zkServer.sh status JMX enabled by default Using config: /etc/zookeeper/conf/zoo.cfg Mode: standalone Mode is standalone instead of something else, here is my zoo.cfg, I did follow the

Re: refactoring ZK so it is plugable, would this make sense?

2014-10-09 Thread S Ahmed
I want kafka features (w/o the redundancy) but don't want to have to run 3 zookeeper instances to save $$. On Thu, Oct 9, 2014 at 2:59 PM, Jun Rao jun...@gmail.com wrote: This may not be easy since you have to implement things like watcher callbacks. What's your main concern with the ZK

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-09 Thread Neha Narkhede
With SimpleConsumer, you will have to handle leader discovery as well as zookeeper based rebalancing. You can see an example here - https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example On Wed, Oct 8, 2014 at 11:45 AM, Sharninder sharnin...@gmail.com wrote: Thanks Gwen.

including KAFKA-1555 in 0.8.2?

2014-10-09 Thread Jun Rao
Hi, Everyone, I just committed KAFKA-1555 (min.isr support) to trunk. I felt that it's probably useful to include it in the 0.8.2 release. Any objections? Thanks, Jun

Clarification about Custom Encoder/Decoder for serialization

2014-10-09 Thread Abraham Jacob
Hi All, I wanted to get some clarification on Kafka's Encoder/Decoder usage. Lets say I want to implement a custom Encoder. public class CustomMessageSerializer implements EncoderMyCustomObject { @Override public byte[] toBytes(String arg0) { // serialize the MyCustomObject return

Auto Purging Consumer Group Configuration [Especially Kafka Console Group]

2014-10-09 Thread Bhavesh Mistry
Hi Kafka, We have lots of lingering console consumer group people have created for testing or debugging purpose for one time use via bin/kafka-console-consumer.sh. Is there auto purging that clean script that Kafka provide ? Is three any API to find out inactive Consumer group and delete

Re: Auto Purging Consumer Group Configuration [Especially Kafka Console Group]

2014-10-09 Thread Gwen Shapira
The problem with Kafka is that we never know when a consumer is truly inactive. But - if you decide to define inactive as consumer who's last offset is lower than anything available on the log (or perhaps lagging by over X messages?), its fairly easy to write a script to detect and clean them

Re: Clarification about Custom Encoder/Decoder for serialization

2014-10-09 Thread Jun Rao
The encoder is instantiated once when the producer is constructed. Thanks, Jun On Thu, Oct 9, 2014 at 6:45 PM, Abraham Jacob abe.jac...@gmail.com wrote: Hi All, I wanted to get some clarification on Kafka's Encoder/Decoder usage. Lets say I want to implement a custom Encoder. public