Re: [VOTE] Kafka 0.8.0 Beta 1 (Candidate 1)

2013-06-13 Thread Jun Rao
+1. Verified unit tests and quick start. Thanks, Jun On Thu, Jun 13, 2013 at 2:41 PM, Joe Stein wrote: > Hello, this is the first candidate release for Kafka 0.8.0 Beta 1 > > This release fixes the following issues > > http://people.apache.org/~joestein/kafka-0.8.0-beta1-candidate1/RELEASE_NO

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
Jun -- thanks. We're using 0.72. No, the messages are not compressed, and since we do appear to be seeing dupes in our tests, it indicates our own code is buggy. Thanks, Philip On Thu, Jun 13, 2013 at 9:15 PM, Jun Rao wrote: > Are you messages compressed in batches? If so, some dups are expect

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Jun Rao
Are you messages compressed in batches? If so, some dups are expected during rebalance. In 0.8, such dups are eliminated. Other than that, rebalance shouldn't cause dups since we commit consumed offsets to ZK before doing a rebalance. Thanks, Jun On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole

Re: Kafka 0.8 Maven and IntelliJ

2013-06-13 Thread Jun Rao
Thanks. Which version of Intellij are you using? Jun On Thu, Jun 13, 2013 at 10:20 AM, Dragos Manolescu < dragos.manole...@servicenow.com> wrote: > Hmm, I've just pulled 0.8.0-beta1-candidate1, removed .idea* from my > top-level directory, executed gen-idea, and then opened and built the > proj

Encoderfactory is static is that the problem?

2013-06-13 Thread Gaurang Jhawar
Hey, 1002 Name Shyam Skills Cooking 1003 Name Jack Skills PHP I am reading this xml file and parsing the data ... Now I`m trying to put it into an avro data file (we`ll not exactly) trying to send it over the server using kafka ... But in the code below .. Parser parse=

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
Just to be clear, I'm not asking that we solve "duplicate messages on crash before commit to Zookeeper", just an apparent problem where if Kafka has some data, and we start on ConsumerConnectors, we get dupe data since some Consumers come up before others. Any help? Philip On Thu, Jun 13, 2013 a

Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
Hello -- is it possible for our code to stall a ConsumerConnector from doing any consuming for, say, 30 seconds, until we can be sure that all other ConsumeConnectors are rebalanced? It seems that the first ConsumerConnector to come up is prefetching some data, and we end up with duplicate message

Re: Using Kafka for "data" messages

2013-06-13 Thread Taylor Gautier
Spot on. This one was of the areas that we had to workaround. Remember that there is a 1:1 relationship of topics to directories and most file systems don't like 10s of thousands of directories. We found on practice that 60k per machine was a practical limit using I believe EXT3FS On Thursday

Re: Producer only finding partition on 1 of 2 Brokers, even though ZK shows 1 partition exists on both Brokers?

2013-06-13 Thread Brett Hoerner
You know what, it's likely this is all because I'm running a bad fork of Kafka 0.7.2 for Scala 2.10 (on the producers/consumers) since that's the version we've standardized on. Behavior in 2.9.2 with the official Kafka 0.7.2 release seems much more normal -- I'm working on downgrading all our clie

Re: Using Kafka for "data" messages

2013-06-13 Thread Josh Foure
Ah yes, I had read that Kafka likes under 1,000 topics but I wasn't sure if that was really a limitation.  In principle I wouldn't mind having all guest events placed on the "GUEST_DATA" queue but I thought that by having more topics I could minimize having consumers read messages only to discar

Re: Using Kafka for "data" messages

2013-06-13 Thread Timothy Chen
Also since you're going to be creating a topic per user, the number of concurrent users will also be a concern to Kafka as it doesn't like massive amounts of topics. Tim On Thu, Jun 13, 2013 at 10:47 AM, Josh Foure wrote: > Hi Mahendra, I think that is where it gets a little tricky. I think i

Re: Using Kafka for "data" messages

2013-06-13 Thread Josh Foure
Hi Mahendra, I think that is where it gets a little tricky.  I think it would work something like this: 1.  Web sends login event for user "user123" to topic "GUEST_EVENT". 2.  All of the systems consume those messages and publish the data messages to topic "GUEST_DATA.user123". 3.  The Recommen

Re: Producer only finding partition on 1 of 2 Brokers, even though ZK shows 1 partition exists on both Brokers?

2013-06-13 Thread Brett Hoerner
As an update, this continues to affect us. First I'd like to note ways in which my issues seems different than KAFKA-278, * I did not add a new broker or a new topic, this topic has been in use on two existing brokers for months * The topic definitely exists on both brokers. The topic/data direct

Re: Kafka 0.8 Maven and IntelliJ

2013-06-13 Thread Dragos Manolescu
Hmm, I've just pulled 0.8.0-beta1-candidate1, removed .idea* from my top-level directory, executed gen-idea, and then opened and built the project in IntelliJ w/o problems. I noticed that the build uses an old version of the sbt-idea plugin: addSbtPlugin("com.github.mpeltonen" % "sbt-idea

Re: Using Kafka for "data" messages

2013-06-13 Thread Mahendra M
Hi Josh, The idea looks very interesting. I just had one doubt. 1. A user logs in. His login id is sent on a topic 2. Other systems (consumers on this topic) consumer this message and publish their results to another topic This will be happening without any particular order for hundreds of users

RE: shipping logs to s3 or other servers for backups

2013-06-13 Thread S Ahmed
Hi, In my application, I am storing user events, and I want to partition the storage by day. So at the end of a day, I want to take that "file" and ship it to s3 or another server as a backup. This way I can replay the events for a specific day if needed. These events also have to be in order.

Re: Using Kafka for "data" messages

2013-06-13 Thread Taylor Gautier
I've been talking about this kind of architecture for years. As you said it's an EDA architecture. You might also want to have a look at Esper if you haven't already - it's a perfect complement to this strategy. At my last job I built a relatively low latency site wide pub sub system that showed

Re: Arguments for Kafka over RabbitMQ ?

2013-06-13 Thread Jonathan Hodges
Hi Alexis, This was very helpful and I also appreciate both yours and Tim's input here. It clears up the cases for when to use Rabbit or Kafka. What is great is they are both open source with vibrant communities behind them. -Jonathan Go On Jun 13, 2013 8:45 AM, "Alexis Richardson" wrote: >

Using Kafka for "data" messages

2013-06-13 Thread Josh Foure
  Hi all, my team is proposing a novel way of using Kafka and I am hoping someone can help do a sanity check on this:   1.  When a user logs into our website, we will create a “logged in” event message in Kafka containing the user id.  2.  30+ systems (consumers each in their own consumer groups)

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
I have but this is a different thing. It's related with ports and security groups and not with the bind addresses. It's solved now. Thanks On 13 June 2013 15:42, Jun Rao wrote: > Have you looked at #3 in http://kafka.apache.org/faq.html? > > Thanks, > > Jun > > > On Thu, Jun 13, 2013 at 6:41 A

Re: Arguments for Kafka over RabbitMQ ?

2013-06-13 Thread Alexis Richardson
Hi all, First, thanks to Tim (from Rabbit) and Jonathan for moving this thread along. Jonathan, I hope you found my links to the data model docs, and Tim's replies, helpful. Has everyone got what they wanted from this thread? alexis On Tue, Jun 11, 2013 at 5:49 PM, Jonathan Hodges wrote: > H

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Jun Rao
Have you looked at #3 in http://kafka.apache.org/faq.html? Thanks, Jun On Thu, Jun 13, 2013 at 6:41 AM, Alexandre Rodrigues < alexan...@blismedia.com> wrote: > I think I know what's happening: > > I tried to run both brokers and ZK on the same machine and it worked. I > also attempted to do th

Re: 0.8 Durability Question

2013-06-13 Thread Jonathan Hodges
Thanks! On Thu, Jun 13, 2013 at 8:33 AM, Neha Narkhede wrote: > No. It only means that messages are written to all replicas in memory. Data > is flushed to disk asynchronously. > > Thanks, > Neha > On Jun 13, 2013 3:29 AM, "Jonathan Hodges" wrote: > > > Looking at Jun’s ApacheCon slides ( > > h

Re: 0.8 Durability Question

2013-06-13 Thread Neha Narkhede
No. It only means that messages are written to all replicas in memory. Data is flushed to disk asynchronously. Thanks, Neha On Jun 13, 2013 3:29 AM, "Jonathan Hodges" wrote: > Looking at Jun’s ApacheCon slides ( > http://www.slideshare.net/junrao/kafka-replication-apachecon2013) slide 21 > title

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
I think I know what's happening: I tried to run both brokers and ZK on the same machine and it worked. I also attempted to do the same but with a ZK node on other machine and it also worked. My guess is something related with ports. All the machines are on EC2 and there might be something related

Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-13 Thread Philip O'Toole
Jun - thanks again. This is very helpful. Philip On Jun 12, 2013, at 9:50 PM, Jun Rao wrote: > Actually, you are right. This can happen on a single topic too, if you have > more than one consumer thread. Each consumer thread pulls data from a > blocking queue, one or more fetchers are putting

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
I've tried the console producer, so I will assume that's not related with the producer. I keep seeing the same entries in the producer from time to time: [2013-06-13 11:04:00,670] WARN Error while fetching metadata [{TopicMetadata for topic C -> No partition metadata for topic C due to kafka.commo

0.8 Durability Question

2013-06-13 Thread Jonathan Hodges
Looking at Jun’s ApacheCon slides ( http://www.slideshare.net/junrao/kafka-replication-apachecon2013) slide 21 titled, ‘Data Flow in Replication’ there are three possible durability configurations which tradeoff latency for greater persistence guarantees. The third row is the ‘no data loss’ config

Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-13 Thread Alexandre Rodrigues
Hi Jun, I was using the 0.8 branch with 2 commits behind but now I am using the latest with the same issue. 3 topics A,B,C, created automatically with replication factor of 2 and partitions 2. 2 brokers (0 and 1). List of topics in zookeeper is the following: topic: A partition: 0leader: 1

Re: Versioning Schema's

2013-06-13 Thread Shone Sadler
Thanks Jun & Phil! Shone On Thu, Jun 13, 2013 at 12:00 AM, Jun Rao wrote: > Yes, we just have customized encoder that encodes the first 4 bytes of md5 > of the schema, followed by Avro bytes. > > Thanks, > > Jun > > > On Wed, Jun 12, 2013 at 9:50 AM, Shone Sadler >wrote: > > > Jun, > > I like