Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Jun Rao
Actually, you are right. This can happen on a single topic too, if you have
more than one consumer thread. Each consumer thread pulls data from a
blocking queue, one or more fetchers are putting data into the queue. Say,
you have two consumer threads and two partitions from the same broker.
There is a single fetcher that fetches both partitions and it will put one
partition's data into a separate queue. So, if one thread stops consuming
data, it's queue will be full at some point. This will block the fetcher
from putting the data into the other queue.

Thanks,

Jun


On Wed, Jun 12, 2013 at 9:10 PM, Philip O'Toole  wrote:

> Jun -- thanks.
>
> But if the topic is the same, doesn't each thread get a partition?
> Isn't that how it works?
>
> Philip
>
> On Wed, Jun 12, 2013 at 9:08 PM, Jun Rao  wrote:
> > Yes, when the consumer is consuming multiple topics, if one thread stops
> > consuming topic 1, it can prevent new data getting into the consumer for
> > topic 2.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jun 12, 2013 at 7:43 PM, Philip O'Toole 
> wrote:
> >
> >> Hello -- we're using 0.72. We're looking at the source, but want to be
> >> sure. :-)
> >>
> >> We create a single ConsumerConnector, call createMessageStreams, and
> >> hand the streams off to individual threads. If one of those threads
> >> calls next() on a stream, gets some messages, and then *blocks* in
> >> some subsequent operation (and blocks for minutes), can it potentially
> >> cause all other threads (calling next() on other streams) to block
> >> too? Does something inside the ConsumerConnector block all other
> >> stream processing? This would explain some behaviour we're seeing.
> >>
> >> Thanks,
> >>
> >> Philip
> >>
>


Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Philip O'Toole
Also, what is it in the ConsumerConnection that causes this behaviour?

I'm proposing here that we move to a model of ConsumerConnection per
thread. This will decouple the flow for each partition, and allow each
to flow, right? We only have one topic on the cluster.

Thanks,

Philip

On Wed, Jun 12, 2013 at 9:10 PM, Philip O'Toole  wrote:
> Jun -- thanks.
>
> But if the topic is the same, doesn't each thread get a partition?
> Isn't that how it works?
>
> Philip
>
> On Wed, Jun 12, 2013 at 9:08 PM, Jun Rao  wrote:
>> Yes, when the consumer is consuming multiple topics, if one thread stops
>> consuming topic 1, it can prevent new data getting into the consumer for
>> topic 2.
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Wed, Jun 12, 2013 at 7:43 PM, Philip O'Toole  wrote:
>>
>>> Hello -- we're using 0.72. We're looking at the source, but want to be
>>> sure. :-)
>>>
>>> We create a single ConsumerConnector, call createMessageStreams, and
>>> hand the streams off to individual threads. If one of those threads
>>> calls next() on a stream, gets some messages, and then *blocks* in
>>> some subsequent operation (and blocks for minutes), can it potentially
>>> cause all other threads (calling next() on other streams) to block
>>> too? Does something inside the ConsumerConnector block all other
>>> stream processing? This would explain some behaviour we're seeing.
>>>
>>> Thanks,
>>>
>>> Philip
>>>


Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Philip O'Toole
Jun -- thanks.

But if the topic is the same, doesn't each thread get a partition?
Isn't that how it works?

Philip

On Wed, Jun 12, 2013 at 9:08 PM, Jun Rao  wrote:
> Yes, when the consumer is consuming multiple topics, if one thread stops
> consuming topic 1, it can prevent new data getting into the consumer for
> topic 2.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jun 12, 2013 at 7:43 PM, Philip O'Toole  wrote:
>
>> Hello -- we're using 0.72. We're looking at the source, but want to be
>> sure. :-)
>>
>> We create a single ConsumerConnector, call createMessageStreams, and
>> hand the streams off to individual threads. If one of those threads
>> calls next() on a stream, gets some messages, and then *blocks* in
>> some subsequent operation (and blocks for minutes), can it potentially
>> cause all other threads (calling next() on other streams) to block
>> too? Does something inside the ConsumerConnector block all other
>> stream processing? This would explain some behaviour we're seeing.
>>
>> Thanks,
>>
>> Philip
>>


Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Jun Rao
Yes, when the consumer is consuming multiple topics, if one thread stops
consuming topic 1, it can prevent new data getting into the consumer for
topic 2.

Thanks,

Jun


On Wed, Jun 12, 2013 at 7:43 PM, Philip O'Toole  wrote:

> Hello -- we're using 0.72. We're looking at the source, but want to be
> sure. :-)
>
> We create a single ConsumerConnector, call createMessageStreams, and
> hand the streams off to individual threads. If one of those threads
> calls next() on a stream, gets some messages, and then *blocks* in
> some subsequent operation (and blocks for minutes), can it potentially
> cause all other threads (calling next() on other streams) to block
> too? Does something inside the ConsumerConnector block all other
> stream processing? This would explain some behaviour we're seeing.
>
> Thanks,
>
> Philip
>


Re: Kafka 0.8 Maven and IntelliJ

2013-06-12 Thread Jun Rao
Dragos,

After the sbt upgrade 3-4 months ago, some of us are struggling to get the
Kafka code cleanly loaded to Intellij after doing "./sbt gen-idea". Were
you able to do that successfully?

Thanks,

Jun


On Wed, Jun 12, 2013 at 10:45 AM, Dragos Manolescu <
dragos.manole...@servicenow.com> wrote:

> For IntelliJ I've always used the gen-idea sbt plugin:
> https://github.com/mpeltonen/sbt-idea
>
> -Dragos
>
>
> On 6/11/13 10:41 PM, "Jason Rosenberg"  wrote:
>
> >Try the one under core/targets?
> >
> >
> >On Tue, Jun 11, 2013 at 3:34 PM, Florin Trofin  wrote:
> >
> >> I downloaded the latest 0.8 snapshot and I want to build using Maven:
> >>
> >> ./sbt make-pom
> >>
> >> Generates a bunch of pom.xml files but when I try to open one of them in
> >> IntelliJ they are not recognized. Do I need to do any other step? Which
> >> pom do I need to open?
> >>
> >> Thanks!
> >>
> >> Florin
> >>
> >>
>
>


Re: Versioning Schema's

2013-06-12 Thread Jun Rao
Yes, we just have customized encoder that encodes the first 4 bytes of md5
of the schema, followed by Avro bytes.

Thanks,

Jun


On Wed, Jun 12, 2013 at 9:50 AM, Shone Sadler wrote:

> Jun,
> I like the idea of an explicit version field, if the schema can be derived
> from the topic name itself. The storage (say 1-4 bytes) would require less
> overhead than a 128 bit md5 at the added cost of managing the version#.
>
> Is it correct to assume that your applications are using two schemas then,
> one system level schema to deserialize the schema id and bytes for the
> application message and a second schema to deserialize those bytes with the
> application schema?
>
> Thanks again!
> Shone
>
>
> On Wed, Jun 12, 2013 at 11:31 AM, Jun Rao  wrote:
>
> > Actually, currently our schema id is the md5 of the schema itself. Not
> > fully sure how this compares with an explicit version field in the
> schema.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao  wrote:
> >
> > > At LinkedIn, we are using option 2.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler  > >wrote:
> > >
> > >> Hello everyone,
> > >>
> > >> After doing some searching on the mailing list for best practices on
> > >> integrating Avro with Kafka there appears to be at least 3 options for
> > >> integrating the Avro Schema; 1) embedding the entire schema within the
> > >> message 2) embedding a unique identifier for the schema in the message
> > and
> > >> 3) deriving the schema from the topic/resource name.
> > >>
> > >> Option 2, appears to be the best option in terms of both efficiency
> and
> > >> flexibility.  However, from a programming perspective it complicates
> the
> > >> solution with the need for both an envelope schema (containing a
> "schema
> > >> id" and "bytes" field for record data) and message schema (containing
> > the
> > >> application specific message fields).  This requires two levels of
> > >> serialization/deserialization.
> > >> Questions:
> > >> 1) How are others dealing with versioning of schemas?
> > >> 2) Is there a more elegant means of embedding a schema ids in a Avro
> > >> message (I am new to both currently ;-)?
> > >>
> > >> Thanks in advance!
> > >>
> > >> Shone
> > >>
> > >
> > >
> >
>


Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-12 Thread Jun Rao
Any error in state-change.log? Also, are you using the latest code in the
0.8 branch?

Thanks,

Jun


On Wed, Jun 12, 2013 at 9:27 AM, Alexandre Rodrigues <
alexan...@blismedia.com> wrote:

> Hi Jun,
>
> Thanks for your prompt answer. The producer yields those errors in the
> beginning, so I think the topic metadata refresh has nothing to do with it.
>
> The problem is one of the brokers isn't leader on any partition assigned to
> it and because topics were created with a replication factor of 1, the
> producer will never connect to that broker at all. What I don't understand
> is why doesn't the broker assume the lead of those partitions.
>
> I deleted all the topics and tried now with a replication factor of two
>
> topic: A  partition: 0leader: 1   replicas: 1,0   isr: 1
> topic: A  partition: 1leader: 0   replicas: 0,1   isr: 0,1
> topic: B partition: 0leader: 0   replicas: 0,1   isr: 0,1
> topic: B partition: 1leader: 1   replicas: 1,0   isr: 1
> topic: C  partition: 0leader: 1   replicas: 1,0   isr: 1
> topic: C  partition: 1leader: 0   replicas: 0,1   isr: 0,1
>
>
> Now producer doesn't yield errors. However, one of the brokers ( broker 0 )
> generates lots of lines like this:
>
> [2013-06-12 16:19:41,805] WARN [KafkaApi-0] Produce request with
> correlation id 404999 from client  on partition [B,0] failed due to
> Partition [B,0] doesn't exist on 0 (kafka.server.KafkaApis)
>
> There should be a replica there, so I don't know why it complains about
> that message.
>
> Have you ever found anything like this?
>
>
>
> On 12 June 2013 16:27, Jun Rao  wrote:
>
> > If the leaders exist in both brokers, the producer should be able to
> > connect to both of them, assuming you don't provide any key when sending
> > the data. Could you try restarting the producer? If there has been broker
> > failures, it may take topic.metadata.refresh.interval.ms for the
> producer
> > to pick up the newly available partitions (see
> > http://kafka.apache.org/08/configuration.html for details).
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jun 12, 2013 at 8:01 AM, Alexandre Rodrigues <
> > alexan...@blismedia.com> wrote:
> >
> > > Hi,
> > >
> > > I have a Kafka 0.8 cluster with two nodes connected to three ZKs, with
> > the
> > > same configuration but the brokerId (one is 0 and the other 1). I
> created
> > > three topics A, B and C with 4 partitions and a replication factor of
> 1.
> > My
> > > idea was to have 2 partitions per topic in each broker. However, when I
> > > connect a producer, I can't have both brokers to write at the same time
> > and
> > > I don't know what's going on.
> > >
> > > My server.config has the following entries:
> > >
> > > auto.create.topics.enable=true
> > > num.partitions=2
> > >
> > >
> > > When I run bin/kafka-list-topic.sh --zookeeper localhost:2181   I get
> the
> > > following partition leader assignments:
> > >
> > > topic: A  partition: 0leader: 1   replicas: 1 isr: 1
> > > topic: A  partition: 1leader: 0   replicas: 0 isr: 0
> > > topic: A  partition: 2leader: 1   replicas: 1 isr: 1
> > > topic: A  partition: 3leader: 0   replicas: 0 isr: 0
> > > topic: B partition: 0leader: 0   replicas: 0 isr: 0
> > > topic: B partition: 1leader: 1   replicas: 1 isr: 1
> > > topic: B partition: 2leader: 0   replicas: 0 isr: 0
> > > topic: B partition: 3leader: 1   replicas: 1 isr: 1
> > > topic: C  partition: 0leader: 0   replicas: 0 isr: 0
> > > topic: C  partition: 1leader: 1   replicas: 1 isr: 1
> > > topic: C  partition: 2leader: 0   replicas: 0 isr: 0
> > > topic: C  partition: 3leader: 1   replicas: 1 isr: 1
> > >
> > >
> > > I've forced reassignment using the kafka-reassign-partitions tool with
> > the
> > > following JSON:
> > >
> > > {"partitions":  [
> > >{"topic": "A", "partition": 1, "replicas": [0] },
> > >{"topic": "A", "partition": 3, "replicas": [0] },
> > >{"topic": "A", "partition": 0, "replicas": [1] },
> > >{"topic": "A", "partition": 2, "replicas": [1] },
> > >{"topic": "B", "partition": 1, "replicas": [0] },
> > >{"topic": "B", "partition": 3, "replicas": [0] },
> > >{"topic": "B", "partition": 0, "replicas": [1] },
> > >{"topic": "B", "partition": 2, "replicas": [1] },
> > >{"topic": "C", "partition": 0, "replicas": [0] },
> > >{"topic": "C", "partition": 1, "replicas": [1] },
> > >{"topic": "C", "partition": 2, "replicas": [0] },
> > >{"topic": "C", "partition": 3, "replicas": [1] }
> > > ]}
> > >
> > > After reassignment, I've restarted producer and nothing worked. I've
> > tried
> > > also to restart both brokers and producer and nothing.
> > >
> > > The producer contains this logs:
> > >
> > > 2013-06-12 14:48:46,467] WARN Error while fetching metadata
>  partition
> > 0
> > > leader: nonereplicas: 

Re: kafka 0.8.0 beta release

2013-06-12 Thread Jun Rao
Joe,

KAFKA-937 is now committed to 0.8. Could you start the release process for
0.8.0 beta?

Thanks,

Jun


On Wed, Jun 12, 2013 at 7:54 AM, Jun Rao  wrote:

> Hi, Everyone,
>
> Joe Stein has kindly agreed to drive this release. We'd like to
> get KAFKA-937 committed to 0.8 (should happen today) and then call a vote.
>
> Thanks,
>
> Jun
>


Re: How to compile java files in examples directory bundled in kafka 0.8?

2013-06-12 Thread Nandigam, Sujitha
I tried running java examples using READ Me but  when I say ./sbt getting 
following error.

Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory 
file:
   /tmp/hsperfdata_root/1312
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

Could you please tell me how to resolve this..

Sujitha


One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Philip O'Toole
Hello -- we're using 0.72. We're looking at the source, but want to be sure. :-)

We create a single ConsumerConnector, call createMessageStreams, and
hand the streams off to individual threads. If one of those threads
calls next() on a stream, gets some messages, and then *blocks* in
some subsequent operation (and blocks for minutes), can it potentially
cause all other threads (calling next() on other streams) to block
too? Does something inside the ConsumerConnector block all other
stream processing? This would explain some behaviour we're seeing.

Thanks,

Philip


Re: Kafka 0.8 Maven and IntelliJ

2013-06-12 Thread Florin Trofin
Thanks Dragos, I've been using that plugin before, that will work on a
developer's machine when you try to build and debug the project but I also
need this to work with my automated build system.
That's why I need maven to work.

I've made a bit more progress:

> cd kafka
> ./sbt make-pom
> cd core
> cp targets/scala-2.8.0/kafka_2.8.0-0.8.0-SNAPSHOT.pom ./pom.xml
> mvn compile

Here you get a bunch of errors because of 3 missing dependencies caused by
messed up metadata for log4j in the repository.
From 
http://stackoverflow.com/questions/9047949/missing-artifact-com-sun-jdmkjmx
toolsjar1-2-1 :

"Change the version of log4j to 1.2.16.
The metadata for 1.2.15 is bad, as you have discovered, because the
dependencies are missing from the central repository. However, there is a
policy of not changing artifacts or metadata in the central maven
repository, because this can lead to builds being unrepeatable. That is, a
build might behave differently if an artifact or its metadata changed."


So edit the pom.xml and change the version for log4j from 1.2.15 to 1.2.16
(maybe somebody can fix it in the sbt configuration so we don't have to
keep patching this by hand?)

Now mvm compile downloads all the dependencies but it doesn't build
anything:

> mvn compile

...
[INFO] No sources to compile
BUILD SUCCESSFUL

'mvm package' gives a similar warning - that the generated jar file will
be empty. So it seems to me that the POM doesn't know anything about
source code that it needs to compile. Maybe this is a quick fix for the
engineer that created the sbt project file, I don't know.

BTW, after renaming the kafka_2.8.0-0.8.0-SNAPSHOT.pom to pom.xml, IDEA
can open the project and it shows the sources in project view, but when I
try to compile the project I get the same result as above (nothing
happens).

Any help on this will be really appreciated.

Thanks,

Florin

On 6/12/13 10:45 AM, "Dragos Manolescu" 
wrote:

>For IntelliJ I've always used the gen-idea sbt plugin:
>https://github.com/mpeltonen/sbt-idea
>
>-Dragos



Re: Kafka 0.8 Maven and IntelliJ

2013-06-12 Thread Dragos Manolescu
For IntelliJ I've always used the gen-idea sbt plugin:
https://github.com/mpeltonen/sbt-idea

-Dragos


On 6/11/13 10:41 PM, "Jason Rosenberg"  wrote:

>Try the one under core/targets?
>
>
>On Tue, Jun 11, 2013 at 3:34 PM, Florin Trofin  wrote:
>
>> I downloaded the latest 0.8 snapshot and I want to build using Maven:
>>
>> ./sbt make-pom
>>
>> Generates a bunch of pom.xml files but when I try to open one of them in
>> IntelliJ they are not recognized. Do I need to do any other step? Which
>> pom do I need to open?
>>
>> Thanks!
>>
>> Florin
>>
>>



Re: Versioning Schema's

2013-06-12 Thread Hargett, Phil
For one of our key Kafka-based applications, we ensure that all messages in the 
stream have a common binary format, which includes (among other things) a 
version identifier and a schema identifier. The version refers to the format 
itself, and the schema refers to the "payload," which s the data for the 
application itself. 

Because we have a small number of schemas (50-100) and we only introduce 5-10 
per year, we stash the mapping from schema identifier to schema details for 
easy access in ZooKeeper.

This does basically create 2 levels of serialization but most processing we do 
occurs just by reading the common format, and not deserializing the payload. 
Only specialized code has to do that extra step. 

On Jun 12, 2013, at 12:50 PM, "Shone Sadler"  wrote:

> Jun,
> I like the idea of an explicit version field, if the schema can be derived
> from the topic name itself. The storage (say 1-4 bytes) would require less
> overhead than a 128 bit md5 at the added cost of managing the version#.
> 
> Is it correct to assume that your applications are using two schemas then,
> one system level schema to deserialize the schema id and bytes for the
> application message and a second schema to deserialize those bytes with the
> application schema?
> 
> Thanks again!
> Shone
> 
> 
> On Wed, Jun 12, 2013 at 11:31 AM, Jun Rao  wrote:
> 
>> Actually, currently our schema id is the md5 of the schema itself. Not
>> fully sure how this compares with an explicit version field in the schema.
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>> On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao  wrote:
>> 
>>> At LinkedIn, we are using option 2.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> 
>>> On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler >> wrote:
>>> 
 Hello everyone,
 
 After doing some searching on the mailing list for best practices on
 integrating Avro with Kafka there appears to be at least 3 options for
 integrating the Avro Schema; 1) embedding the entire schema within the
 message 2) embedding a unique identifier for the schema in the message
>> and
 3) deriving the schema from the topic/resource name.
 
 Option 2, appears to be the best option in terms of both efficiency and
 flexibility.  However, from a programming perspective it complicates the
 solution with the need for both an envelope schema (containing a "schema
 id" and "bytes" field for record data) and message schema (containing
>> the
 application specific message fields).  This requires two levels of
 serialization/deserialization.
 Questions:
 1) How are others dealing with versioning of schemas?
 2) Is there a more elegant means of embedding a schema ids in a Avro
 message (I am new to both currently ;-)?
 
 Thanks in advance!
 
 Shone
 
>>> 
>>> 
>> 


Re: Versioning Schema's

2013-06-12 Thread Shone Sadler
Jun,
I like the idea of an explicit version field, if the schema can be derived
from the topic name itself. The storage (say 1-4 bytes) would require less
overhead than a 128 bit md5 at the added cost of managing the version#.

Is it correct to assume that your applications are using two schemas then,
one system level schema to deserialize the schema id and bytes for the
application message and a second schema to deserialize those bytes with the
application schema?

Thanks again!
Shone


On Wed, Jun 12, 2013 at 11:31 AM, Jun Rao  wrote:

> Actually, currently our schema id is the md5 of the schema itself. Not
> fully sure how this compares with an explicit version field in the schema.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao  wrote:
>
> > At LinkedIn, we are using option 2.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler  >wrote:
> >
> >> Hello everyone,
> >>
> >> After doing some searching on the mailing list for best practices on
> >> integrating Avro with Kafka there appears to be at least 3 options for
> >> integrating the Avro Schema; 1) embedding the entire schema within the
> >> message 2) embedding a unique identifier for the schema in the message
> and
> >> 3) deriving the schema from the topic/resource name.
> >>
> >> Option 2, appears to be the best option in terms of both efficiency and
> >> flexibility.  However, from a programming perspective it complicates the
> >> solution with the need for both an envelope schema (containing a "schema
> >> id" and "bytes" field for record data) and message schema (containing
> the
> >> application specific message fields).  This requires two levels of
> >> serialization/deserialization.
> >> Questions:
> >> 1) How are others dealing with versioning of schemas?
> >> 2) Is there a more elegant means of embedding a schema ids in a Avro
> >> message (I am new to both currently ;-)?
> >>
> >> Thanks in advance!
> >>
> >> Shone
> >>
> >
> >
>


Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-12 Thread Alexandre Rodrigues
Hi Jun,

Thanks for your prompt answer. The producer yields those errors in the
beginning, so I think the topic metadata refresh has nothing to do with it.

The problem is one of the brokers isn't leader on any partition assigned to
it and because topics were created with a replication factor of 1, the
producer will never connect to that broker at all. What I don't understand
is why doesn't the broker assume the lead of those partitions.

I deleted all the topics and tried now with a replication factor of two

topic: A  partition: 0leader: 1   replicas: 1,0   isr: 1
topic: A  partition: 1leader: 0   replicas: 0,1   isr: 0,1
topic: B partition: 0leader: 0   replicas: 0,1   isr: 0,1
topic: B partition: 1leader: 1   replicas: 1,0   isr: 1
topic: C  partition: 0leader: 1   replicas: 1,0   isr: 1
topic: C  partition: 1leader: 0   replicas: 0,1   isr: 0,1


Now producer doesn't yield errors. However, one of the brokers ( broker 0 )
generates lots of lines like this:

[2013-06-12 16:19:41,805] WARN [KafkaApi-0] Produce request with
correlation id 404999 from client  on partition [B,0] failed due to
Partition [B,0] doesn't exist on 0 (kafka.server.KafkaApis)

There should be a replica there, so I don't know why it complains about
that message.

Have you ever found anything like this?



On 12 June 2013 16:27, Jun Rao  wrote:

> If the leaders exist in both brokers, the producer should be able to
> connect to both of them, assuming you don't provide any key when sending
> the data. Could you try restarting the producer? If there has been broker
> failures, it may take topic.metadata.refresh.interval.ms for the producer
> to pick up the newly available partitions (see
> http://kafka.apache.org/08/configuration.html for details).
>
> Thanks,
>
> Jun
>
>
> On Wed, Jun 12, 2013 at 8:01 AM, Alexandre Rodrigues <
> alexan...@blismedia.com> wrote:
>
> > Hi,
> >
> > I have a Kafka 0.8 cluster with two nodes connected to three ZKs, with
> the
> > same configuration but the brokerId (one is 0 and the other 1). I created
> > three topics A, B and C with 4 partitions and a replication factor of 1.
> My
> > idea was to have 2 partitions per topic in each broker. However, when I
> > connect a producer, I can't have both brokers to write at the same time
> and
> > I don't know what's going on.
> >
> > My server.config has the following entries:
> >
> > auto.create.topics.enable=true
> > num.partitions=2
> >
> >
> > When I run bin/kafka-list-topic.sh --zookeeper localhost:2181   I get the
> > following partition leader assignments:
> >
> > topic: A  partition: 0leader: 1   replicas: 1 isr: 1
> > topic: A  partition: 1leader: 0   replicas: 0 isr: 0
> > topic: A  partition: 2leader: 1   replicas: 1 isr: 1
> > topic: A  partition: 3leader: 0   replicas: 0 isr: 0
> > topic: B partition: 0leader: 0   replicas: 0 isr: 0
> > topic: B partition: 1leader: 1   replicas: 1 isr: 1
> > topic: B partition: 2leader: 0   replicas: 0 isr: 0
> > topic: B partition: 3leader: 1   replicas: 1 isr: 1
> > topic: C  partition: 0leader: 0   replicas: 0 isr: 0
> > topic: C  partition: 1leader: 1   replicas: 1 isr: 1
> > topic: C  partition: 2leader: 0   replicas: 0 isr: 0
> > topic: C  partition: 3leader: 1   replicas: 1 isr: 1
> >
> >
> > I've forced reassignment using the kafka-reassign-partitions tool with
> the
> > following JSON:
> >
> > {"partitions":  [
> >{"topic": "A", "partition": 1, "replicas": [0] },
> >{"topic": "A", "partition": 3, "replicas": [0] },
> >{"topic": "A", "partition": 0, "replicas": [1] },
> >{"topic": "A", "partition": 2, "replicas": [1] },
> >{"topic": "B", "partition": 1, "replicas": [0] },
> >{"topic": "B", "partition": 3, "replicas": [0] },
> >{"topic": "B", "partition": 0, "replicas": [1] },
> >{"topic": "B", "partition": 2, "replicas": [1] },
> >{"topic": "C", "partition": 0, "replicas": [0] },
> >{"topic": "C", "partition": 1, "replicas": [1] },
> >{"topic": "C", "partition": 2, "replicas": [0] },
> >{"topic": "C", "partition": 3, "replicas": [1] }
> > ]}
> >
> > After reassignment, I've restarted producer and nothing worked. I've
> tried
> > also to restart both brokers and producer and nothing.
> >
> > The producer contains this logs:
> >
> > 2013-06-12 14:48:46,467] WARN Error while fetching metadatapartition
> 0
> > leader: nonereplicas:   isr:isUnderReplicated: false for
> > topic partition [C,0]: [class kafka.common.LeaderNotAvailableException]
> > (kafka.producer.BrokerPartitionInfo)
> > [2013-06-12 14:48:46,467] WARN Error while fetching metadata
>  partition 0
> > leader: nonereplicas:   isr:isUnderReplicated: false for
> > topic partition [C,0]: [class kafka.common.LeaderNotAvailableException]
> > (kafka.producer.Brok

Re: Versioning Schema's

2013-06-12 Thread Jun Rao
Actually, currently our schema id is the md5 of the schema itself. Not
fully sure how this compares with an explicit version field in the schema.

Thanks,

Jun


On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao  wrote:

> At LinkedIn, we are using option 2.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler wrote:
>
>> Hello everyone,
>>
>> After doing some searching on the mailing list for best practices on
>> integrating Avro with Kafka there appears to be at least 3 options for
>> integrating the Avro Schema; 1) embedding the entire schema within the
>> message 2) embedding a unique identifier for the schema in the message and
>> 3) deriving the schema from the topic/resource name.
>>
>> Option 2, appears to be the best option in terms of both efficiency and
>> flexibility.  However, from a programming perspective it complicates the
>> solution with the need for both an envelope schema (containing a "schema
>> id" and "bytes" field for record data) and message schema (containing the
>> application specific message fields).  This requires two levels of
>> serialization/deserialization.
>> Questions:
>> 1) How are others dealing with versioning of schemas?
>> 2) Is there a more elegant means of embedding a schema ids in a Avro
>> message (I am new to both currently ;-)?
>>
>> Thanks in advance!
>>
>> Shone
>>
>
>


Re: Versioning Schema's

2013-06-12 Thread Jun Rao
At LinkedIn, we are using option 2.

Thanks,

Jun


On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler wrote:

> Hello everyone,
>
> After doing some searching on the mailing list for best practices on
> integrating Avro with Kafka there appears to be at least 3 options for
> integrating the Avro Schema; 1) embedding the entire schema within the
> message 2) embedding a unique identifier for the schema in the message and
> 3) deriving the schema from the topic/resource name.
>
> Option 2, appears to be the best option in terms of both efficiency and
> flexibility.  However, from a programming perspective it complicates the
> solution with the need for both an envelope schema (containing a "schema
> id" and "bytes" field for record data) and message schema (containing the
> application specific message fields).  This requires two levels of
> serialization/deserialization.
> Questions:
> 1) How are others dealing with versioning of schemas?
> 2) Is there a more elegant means of embedding a schema ids in a Avro
> message (I am new to both currently ;-)?
>
> Thanks in advance!
>
> Shone
>


Re: Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-12 Thread Jun Rao
If the leaders exist in both brokers, the producer should be able to
connect to both of them, assuming you don't provide any key when sending
the data. Could you try restarting the producer? If there has been broker
failures, it may take topic.metadata.refresh.interval.ms for the producer
to pick up the newly available partitions (see
http://kafka.apache.org/08/configuration.html for details).

Thanks,

Jun


On Wed, Jun 12, 2013 at 8:01 AM, Alexandre Rodrigues <
alexan...@blismedia.com> wrote:

> Hi,
>
> I have a Kafka 0.8 cluster with two nodes connected to three ZKs, with the
> same configuration but the brokerId (one is 0 and the other 1). I created
> three topics A, B and C with 4 partitions and a replication factor of 1. My
> idea was to have 2 partitions per topic in each broker. However, when I
> connect a producer, I can't have both brokers to write at the same time and
> I don't know what's going on.
>
> My server.config has the following entries:
>
> auto.create.topics.enable=true
> num.partitions=2
>
>
> When I run bin/kafka-list-topic.sh --zookeeper localhost:2181   I get the
> following partition leader assignments:
>
> topic: A  partition: 0leader: 1   replicas: 1 isr: 1
> topic: A  partition: 1leader: 0   replicas: 0 isr: 0
> topic: A  partition: 2leader: 1   replicas: 1 isr: 1
> topic: A  partition: 3leader: 0   replicas: 0 isr: 0
> topic: B partition: 0leader: 0   replicas: 0 isr: 0
> topic: B partition: 1leader: 1   replicas: 1 isr: 1
> topic: B partition: 2leader: 0   replicas: 0 isr: 0
> topic: B partition: 3leader: 1   replicas: 1 isr: 1
> topic: C  partition: 0leader: 0   replicas: 0 isr: 0
> topic: C  partition: 1leader: 1   replicas: 1 isr: 1
> topic: C  partition: 2leader: 0   replicas: 0 isr: 0
> topic: C  partition: 3leader: 1   replicas: 1 isr: 1
>
>
> I've forced reassignment using the kafka-reassign-partitions tool with the
> following JSON:
>
> {"partitions":  [
>{"topic": "A", "partition": 1, "replicas": [0] },
>{"topic": "A", "partition": 3, "replicas": [0] },
>{"topic": "A", "partition": 0, "replicas": [1] },
>{"topic": "A", "partition": 2, "replicas": [1] },
>{"topic": "B", "partition": 1, "replicas": [0] },
>{"topic": "B", "partition": 3, "replicas": [0] },
>{"topic": "B", "partition": 0, "replicas": [1] },
>{"topic": "B", "partition": 2, "replicas": [1] },
>{"topic": "C", "partition": 0, "replicas": [0] },
>{"topic": "C", "partition": 1, "replicas": [1] },
>{"topic": "C", "partition": 2, "replicas": [0] },
>{"topic": "C", "partition": 3, "replicas": [1] }
> ]}
>
> After reassignment, I've restarted producer and nothing worked. I've tried
> also to restart both brokers and producer and nothing.
>
> The producer contains this logs:
>
> 2013-06-12 14:48:46,467] WARN Error while fetching metadatapartition 0
> leader: nonereplicas:   isr:isUnderReplicated: false for
> topic partition [C,0]: [class kafka.common.LeaderNotAvailableException]
> (kafka.producer.BrokerPartitionInfo)
> [2013-06-12 14:48:46,467] WARN Error while fetching metadatapartition 0
> leader: nonereplicas:   isr:isUnderReplicated: false for
> topic partition [C,0]: [class kafka.common.LeaderNotAvailableException]
> (kafka.producer.BrokerPartitionInfo)
> [2013-06-12 14:48:46,468] WARN Error while fetching metadatapartition 2
> leader: nonereplicas:   isr:isUnderReplicated: false for
> topic partition [C,2]: [class kafka.common.LeaderNotAvailableException]
> (kafka.producer.BrokerPartitionInfo)
> [2013-06-12 14:48:46,468] WARN Error while fetching metadatapartition 2
> leader: nonereplicas:   isr:isUnderReplicated: false for
> topic partition [C,2]: [class kafka.common.LeaderNotAvailableException]
> (kafka.producer.BrokerPartitionInfo)
>
>
> And sometimes lines like this:
>
> [2013-06-12 14:55:37,339] WARN Error while fetching metadata
> [{TopicMetadata for topic B ->
> No partition metadata for topic B due to
> kafka.common.LeaderNotAvailableException}] for topic [B]: class
> kafka.common.LeaderNotAvailableException
>  (kafka.producer.BrokerPartitionInfo)
>
>
> Do you guys have any idea what's going on?
>
> Thanks in advance,
> Alex
>
> --
>
> @BlisMedia 
>
> www.blismedia.com 
>
> This email and any attachments to it may be confidential and are intended
> solely
> for the use of the individual to whom it is addressed. Any views or
> opinions
> expressed are solely those of the author and do not necessarily represent
> those of BlisMedia Ltd, a company registered in England and Wales with
> registered number 06455773. Its registered office is 3rd Floor, 101 New
> Cavendish St, London, W1W 6XH, United Kingdom.
>
> If you are not the intended recipient of this email, you must 

Producer will pick one of the two brokers, but never the two at same time [0.8]

2013-06-12 Thread Alexandre Rodrigues
Hi,

I have a Kafka 0.8 cluster with two nodes connected to three ZKs, with the
same configuration but the brokerId (one is 0 and the other 1). I created
three topics A, B and C with 4 partitions and a replication factor of 1. My
idea was to have 2 partitions per topic in each broker. However, when I
connect a producer, I can't have both brokers to write at the same time and
I don't know what's going on.

My server.config has the following entries:

auto.create.topics.enable=true
num.partitions=2


When I run bin/kafka-list-topic.sh --zookeeper localhost:2181   I get the
following partition leader assignments:

topic: A  partition: 0leader: 1   replicas: 1 isr: 1
topic: A  partition: 1leader: 0   replicas: 0 isr: 0
topic: A  partition: 2leader: 1   replicas: 1 isr: 1
topic: A  partition: 3leader: 0   replicas: 0 isr: 0
topic: B partition: 0leader: 0   replicas: 0 isr: 0
topic: B partition: 1leader: 1   replicas: 1 isr: 1
topic: B partition: 2leader: 0   replicas: 0 isr: 0
topic: B partition: 3leader: 1   replicas: 1 isr: 1
topic: C  partition: 0leader: 0   replicas: 0 isr: 0
topic: C  partition: 1leader: 1   replicas: 1 isr: 1
topic: C  partition: 2leader: 0   replicas: 0 isr: 0
topic: C  partition: 3leader: 1   replicas: 1 isr: 1


I've forced reassignment using the kafka-reassign-partitions tool with the
following JSON:

{"partitions":  [
   {"topic": "A", "partition": 1, "replicas": [0] },
   {"topic": "A", "partition": 3, "replicas": [0] },
   {"topic": "A", "partition": 0, "replicas": [1] },
   {"topic": "A", "partition": 2, "replicas": [1] },
   {"topic": "B", "partition": 1, "replicas": [0] },
   {"topic": "B", "partition": 3, "replicas": [0] },
   {"topic": "B", "partition": 0, "replicas": [1] },
   {"topic": "B", "partition": 2, "replicas": [1] },
   {"topic": "C", "partition": 0, "replicas": [0] },
   {"topic": "C", "partition": 1, "replicas": [1] },
   {"topic": "C", "partition": 2, "replicas": [0] },
   {"topic": "C", "partition": 3, "replicas": [1] }
]}

After reassignment, I've restarted producer and nothing worked. I've tried
also to restart both brokers and producer and nothing.

The producer contains this logs:

2013-06-12 14:48:46,467] WARN Error while fetching metadatapartition 0
leader: nonereplicas:   isr:isUnderReplicated: false for
topic partition [C,0]: [class kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-06-12 14:48:46,467] WARN Error while fetching metadatapartition 0
leader: nonereplicas:   isr:isUnderReplicated: false for
topic partition [C,0]: [class kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-06-12 14:48:46,468] WARN Error while fetching metadatapartition 2
leader: nonereplicas:   isr:isUnderReplicated: false for
topic partition [C,2]: [class kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-06-12 14:48:46,468] WARN Error while fetching metadatapartition 2
leader: nonereplicas:   isr:isUnderReplicated: false for
topic partition [C,2]: [class kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)


And sometimes lines like this:

[2013-06-12 14:55:37,339] WARN Error while fetching metadata
[{TopicMetadata for topic B ->
No partition metadata for topic B due to
kafka.common.LeaderNotAvailableException}] for topic [B]: class
kafka.common.LeaderNotAvailableException
 (kafka.producer.BrokerPartitionInfo)


Do you guys have any idea what's going on?

Thanks in advance,
Alex

-- 

@BlisMedia 

www.blismedia.com 

This email and any attachments to it may be confidential and are intended 
solely 
for the use of the individual to whom it is addressed. Any views or opinions 
expressed are solely those of the author and do not necessarily represent 
those of BlisMedia Ltd, a company registered in England and Wales with 
registered number 06455773. Its registered office is 3rd Floor, 101 New 
Cavendish St, London, W1W 6XH, United Kingdom.

If you are not the intended recipient of this email, you must neither take 
any action based upon its contents, nor copy or show it to anyone. Please 
contact the sender if you believe you have received this email in error. 


kafka 0.8.0 beta release

2013-06-12 Thread Jun Rao
Hi, Everyone,

Joe Stein has kindly agreed to drive this release. We'd like to
get KAFKA-937 committed to 0.8 (should happen today) and then call a vote.

Thanks,

Jun


Versioning Schema's

2013-06-12 Thread Shone Sadler
Hello everyone,

After doing some searching on the mailing list for best practices on
integrating Avro with Kafka there appears to be at least 3 options for
integrating the Avro Schema; 1) embedding the entire schema within the
message 2) embedding a unique identifier for the schema in the message and
3) deriving the schema from the topic/resource name.

Option 2, appears to be the best option in terms of both efficiency and
flexibility.  However, from a programming perspective it complicates the
solution with the need for both an envelope schema (containing a "schema
id" and "bytes" field for record data) and message schema (containing the
application specific message fields).  This requires two levels of
serialization/deserialization.
Questions:
1) How are others dealing with versioning of schemas?
2) Is there a more elegant means of embedding a schema ids in a Avro
message (I am new to both currently ;-)?

Thanks in advance!

Shone