Re: Kafka wiki Documentation conventions - looking for feedback

2013-05-01 Thread Jun Rao
The following are sample encoder/decoder in java.

class StringEncode implements Encoder {
   private String encoding = null;
   public StringEncoder(VerifiableProperties props) {
if(props == null)
  encoding = "UTF8";
else
  encoding = props.getString("serializer.encoding", "UTF8");
  }

  public byte[] def toBytes(String s) {
if(s == null)
  return null;
else
  return s.getBytes(encoding);
  }
}

class StringDecoder implements Decoder {
  private String encoding = null;
  public StringDecoder(VerifiableProperties props) {
if(props == null)
  encoding = "UTF8";
else
  encoding = props.getString("serializer.encoding", "UTF8");
  }

  public String fromBytes(byte bytes[]) {
return new String(bytes, encoding);
  }
}


Thanks,

Jun


On Wed, May 1, 2013 at 12:33 PM, Chris Curtin wrote:

> Hi Jun
>
> I've added #1 and #2.
>
> I'll need to think about where to put #3, maybe even adding a 'tips and
> tricks' section?
>
> I've not had to do any encoder/decoders. Can anyone else offer some example
> code I can incorporate into an example?
>
> Thanks,
>
> Chris
>
>
> On Wed, May 1, 2013 at 11:45 AM, Jun Rao  wrote:
>
> > Chris,
> >
> > Thanks. This is very helpful. I linked your wiki pages to our website. A
> > few more comments:
> >
> > 1. Producer: The details of the meaning of request.required.acks are
> > described in http://kafka.apache.org/08/configuration.html. It would be
> > great if you can add a link to the description in your wiki.
> >
> > 2. High level consumer: Could you add the proper way of stopping the
> > consumer? One just need to call consumer.shutdown(). After this is
> called,
> > hasNext() call in the Kafka stream iterator will return false.
> >
> > 3. SimpleConsumer: We have the following api that returns the offset of
> the
> > last message exposed to the consumer. The difference btw high watermark
> and
> > the offset of the last consumed message tells you how many messages the
> > consumer is behind the broker.
> >   highWatermark(topic: String, partition: Int)
> >
> > Finally, it would be great if you can extend the wiki with customized
> > encoder (Producer) and decoder (Consumer) at some point.
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, May 1, 2013 at 6:44 AM, Chris Curtin 
> > wrote:
> >
> > > I've tested my examples with the new (4/30) release and they work, so
> > I've
> > > updated the documentation.
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Mon, Apr 29, 2013 at 6:18 PM, Jun Rao  wrote:
> > >
> > > > Thanks. I also updated your producer example to reflect a recent
> config
> > > > change (broker.list => metadata.broker.list).
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Apr 29, 2013 at 11:19 AM, Chris Curtin <
> curtin.ch...@gmail.com
> > > > >wrote:
> > > >
> > > > > Thanks, I missed that the addition of consumers can cause a
> > re-balance.
> > > > > Thought it was only on Leader changes.
> > > > >
> > > > > I've updated the wording in the example.
> > > > >
> > > > > I'll pull down the beta and test my application then change the
> names
> > > on
> > > > > the properties.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Chris
> > > > >
> > > > >
> > > > > On Mon, Apr 29, 2013 at 11:58 AM, Jun Rao 
> wrote:
> > > > >
> > > > > > Basically, every time a consumer joins a group, every consumer in
> > the
> > > > > > groups gets a ZK notification and each of them tries to own a
> > subset
> > > of
> > > > > the
> > > > > > total number of partitions. A given partition is only assigned to
> > one
> > > > of
> > > > > > the consumers in the same group. Once the ownership is
> determined,
> > > each
> > > > > > consumer consumes messages coming from its partitions and manages
> > the
> > > > > > offset of those partitions. Since at any given point of time, a
> > > > partition
> > > > > > is only owned by one consumer, there won't be conflicts on
> updating
> > > the
> > > > > > offsets. More details are described in the "consumer rebalancing
> > > > > algorithm"
> > > > > > section of http://kafka.apache.org/07/design.html
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin <
> > > curtin.ch...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Jun, can you explain this a little better? I thought when using
> > > > > Consumer
> > > > > > > Groups that on startup Kafka connects to ZooKeeper and finds
> the
> > > last
> > > > > > read
> > > > > > > offset for every partition in the topic being requested for the
> > > > group.
> > > > > > That
> > > > > > > is then the starting point for the consumer threads.
> > > > > > >
> > > > > > > If a second process starts while the first one is running with
> > the
> > > > same
> > > > > > > Consumer Group, won't the second one read the last offsets
> > consumed
> > > > by
> > > > > > the
> > > > > > > already running process and start processing from there? Then
> as
> > > the
> 

Re: Kafka wiki Documentation conventions - looking for feedback

2013-05-01 Thread Chris Curtin
Hi Jun

I've added #1 and #2.

I'll need to think about where to put #3, maybe even adding a 'tips and
tricks' section?

I've not had to do any encoder/decoders. Can anyone else offer some example
code I can incorporate into an example?

Thanks,

Chris


On Wed, May 1, 2013 at 11:45 AM, Jun Rao  wrote:

> Chris,
>
> Thanks. This is very helpful. I linked your wiki pages to our website. A
> few more comments:
>
> 1. Producer: The details of the meaning of request.required.acks are
> described in http://kafka.apache.org/08/configuration.html. It would be
> great if you can add a link to the description in your wiki.
>
> 2. High level consumer: Could you add the proper way of stopping the
> consumer? One just need to call consumer.shutdown(). After this is called,
> hasNext() call in the Kafka stream iterator will return false.
>
> 3. SimpleConsumer: We have the following api that returns the offset of the
> last message exposed to the consumer. The difference btw high watermark and
> the offset of the last consumed message tells you how many messages the
> consumer is behind the broker.
>   highWatermark(topic: String, partition: Int)
>
> Finally, it would be great if you can extend the wiki with customized
> encoder (Producer) and decoder (Consumer) at some point.
> Thanks,
>
> Jun
>
>
> On Wed, May 1, 2013 at 6:44 AM, Chris Curtin 
> wrote:
>
> > I've tested my examples with the new (4/30) release and they work, so
> I've
> > updated the documentation.
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Mon, Apr 29, 2013 at 6:18 PM, Jun Rao  wrote:
> >
> > > Thanks. I also updated your producer example to reflect a recent config
> > > change (broker.list => metadata.broker.list).
> > >
> > > Jun
> > >
> > >
> > > On Mon, Apr 29, 2013 at 11:19 AM, Chris Curtin  > > >wrote:
> > >
> > > > Thanks, I missed that the addition of consumers can cause a
> re-balance.
> > > > Thought it was only on Leader changes.
> > > >
> > > > I've updated the wording in the example.
> > > >
> > > > I'll pull down the beta and test my application then change the names
> > on
> > > > the properties.
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > > >
> > > > On Mon, Apr 29, 2013 at 11:58 AM, Jun Rao  wrote:
> > > >
> > > > > Basically, every time a consumer joins a group, every consumer in
> the
> > > > > groups gets a ZK notification and each of them tries to own a
> subset
> > of
> > > > the
> > > > > total number of partitions. A given partition is only assigned to
> one
> > > of
> > > > > the consumers in the same group. Once the ownership is determined,
> > each
> > > > > consumer consumes messages coming from its partitions and manages
> the
> > > > > offset of those partitions. Since at any given point of time, a
> > > partition
> > > > > is only owned by one consumer, there won't be conflicts on updating
> > the
> > > > > offsets. More details are described in the "consumer rebalancing
> > > > algorithm"
> > > > > section of http://kafka.apache.org/07/design.html
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin <
> > curtin.ch...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Jun, can you explain this a little better? I thought when using
> > > > Consumer
> > > > > > Groups that on startup Kafka connects to ZooKeeper and finds the
> > last
> > > > > read
> > > > > > offset for every partition in the topic being requested for the
> > > group.
> > > > > That
> > > > > > is then the starting point for the consumer threads.
> > > > > >
> > > > > > If a second process starts while the first one is running with
> the
> > > same
> > > > > > Consumer Group, won't the second one read the last offsets
> consumed
> > > by
> > > > > the
> > > > > > already running process and start processing from there? Then as
> > the
> > > > > first
> > > > > > process syncs consumed offsets, won't the 2nd process's next
> update
> > > > > > overwrite them?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao 
> > wrote:
> > > > > >
> > > > > > > Chris,
> > > > > > >
> > > > > > > Thanks for the writeup. Looks great overall. A couple of
> > comments.
> > > > > > >
> > > > > > > 1. At the beginning, it sounds like that one can't run multiple
> > > > > processes
> > > > > > > of consumers in the same group. This is actually not true. We
> can
> > > > > create
> > > > > > > multiple instances of consumers for the same group in the same
> > JVM
> > > or
> > > > > > > different JVMs. The consumers will auto-balance among
> themselves.
> > > > > > >
> > > > > > > 2. We have changed the name of some config properties.
> > > > > > > auto.commit.interval.ms is correct. However, zk.connect,
> > > > > > > zk.session.timeout.ms and zk.sync.time.ms are changed to
> > > > > > > zookeeper.connect,
> > > > > > > zookeeper.session.timeout.ms, and zookeeper.sync.time.ms,
> 

Re: Kafka wiki Documentation conventions - looking for feedback

2013-05-01 Thread Jun Rao
Chris,

Thanks. This is very helpful. I linked your wiki pages to our website. A
few more comments:

1. Producer: The details of the meaning of request.required.acks are
described in http://kafka.apache.org/08/configuration.html. It would be
great if you can add a link to the description in your wiki.

2. High level consumer: Could you add the proper way of stopping the
consumer? One just need to call consumer.shutdown(). After this is called,
hasNext() call in the Kafka stream iterator will return false.

3. SimpleConsumer: We have the following api that returns the offset of the
last message exposed to the consumer. The difference btw high watermark and
the offset of the last consumed message tells you how many messages the
consumer is behind the broker.
  highWatermark(topic: String, partition: Int)

Finally, it would be great if you can extend the wiki with customized
encoder (Producer) and decoder (Consumer) at some point.
Thanks,

Jun


On Wed, May 1, 2013 at 6:44 AM, Chris Curtin  wrote:

> I've tested my examples with the new (4/30) release and they work, so I've
> updated the documentation.
>
> Thanks,
>
> Chris
>
>
> On Mon, Apr 29, 2013 at 6:18 PM, Jun Rao  wrote:
>
> > Thanks. I also updated your producer example to reflect a recent config
> > change (broker.list => metadata.broker.list).
> >
> > Jun
> >
> >
> > On Mon, Apr 29, 2013 at 11:19 AM, Chris Curtin  > >wrote:
> >
> > > Thanks, I missed that the addition of consumers can cause a re-balance.
> > > Thought it was only on Leader changes.
> > >
> > > I've updated the wording in the example.
> > >
> > > I'll pull down the beta and test my application then change the names
> on
> > > the properties.
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Mon, Apr 29, 2013 at 11:58 AM, Jun Rao  wrote:
> > >
> > > > Basically, every time a consumer joins a group, every consumer in the
> > > > groups gets a ZK notification and each of them tries to own a subset
> of
> > > the
> > > > total number of partitions. A given partition is only assigned to one
> > of
> > > > the consumers in the same group. Once the ownership is determined,
> each
> > > > consumer consumes messages coming from its partitions and manages the
> > > > offset of those partitions. Since at any given point of time, a
> > partition
> > > > is only owned by one consumer, there won't be conflicts on updating
> the
> > > > offsets. More details are described in the "consumer rebalancing
> > > algorithm"
> > > > section of http://kafka.apache.org/07/design.html
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin <
> curtin.ch...@gmail.com
> > > > >wrote:
> > > >
> > > > > Jun, can you explain this a little better? I thought when using
> > > Consumer
> > > > > Groups that on startup Kafka connects to ZooKeeper and finds the
> last
> > > > read
> > > > > offset for every partition in the topic being requested for the
> > group.
> > > > That
> > > > > is then the starting point for the consumer threads.
> > > > >
> > > > > If a second process starts while the first one is running with the
> > same
> > > > > Consumer Group, won't the second one read the last offsets consumed
> > by
> > > > the
> > > > > already running process and start processing from there? Then as
> the
> > > > first
> > > > > process syncs consumed offsets, won't the 2nd process's next update
> > > > > overwrite them?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Chris
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao 
> wrote:
> > > > >
> > > > > > Chris,
> > > > > >
> > > > > > Thanks for the writeup. Looks great overall. A couple of
> comments.
> > > > > >
> > > > > > 1. At the beginning, it sounds like that one can't run multiple
> > > > processes
> > > > > > of consumers in the same group. This is actually not true. We can
> > > > create
> > > > > > multiple instances of consumers for the same group in the same
> JVM
> > or
> > > > > > different JVMs. The consumers will auto-balance among themselves.
> > > > > >
> > > > > > 2. We have changed the name of some config properties.
> > > > > > auto.commit.interval.ms is correct. However, zk.connect,
> > > > > > zk.session.timeout.ms and zk.sync.time.ms are changed to
> > > > > > zookeeper.connect,
> > > > > > zookeeper.session.timeout.ms, and zookeeper.sync.time.ms,
> > > > respectively.
> > > > > >
> > > > > > I will add a link to your wiki in our website.
> > > > > >
> > > > > > Thanks again.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin <
> > > curtin.ch...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > I finished and published it this morning:
> > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > > > > >
> > > > > > > One question: when documenting the ConsumerConfig paramet

Re: Kafka wiki Documentation conventions - looking for feedback

2013-05-01 Thread Chris Curtin
I've tested my examples with the new (4/30) release and they work, so I've
updated the documentation.

Thanks,

Chris


On Mon, Apr 29, 2013 at 6:18 PM, Jun Rao  wrote:

> Thanks. I also updated your producer example to reflect a recent config
> change (broker.list => metadata.broker.list).
>
> Jun
>
>
> On Mon, Apr 29, 2013 at 11:19 AM, Chris Curtin  >wrote:
>
> > Thanks, I missed that the addition of consumers can cause a re-balance.
> > Thought it was only on Leader changes.
> >
> > I've updated the wording in the example.
> >
> > I'll pull down the beta and test my application then change the names on
> > the properties.
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Mon, Apr 29, 2013 at 11:58 AM, Jun Rao  wrote:
> >
> > > Basically, every time a consumer joins a group, every consumer in the
> > > groups gets a ZK notification and each of them tries to own a subset of
> > the
> > > total number of partitions. A given partition is only assigned to one
> of
> > > the consumers in the same group. Once the ownership is determined, each
> > > consumer consumes messages coming from its partitions and manages the
> > > offset of those partitions. Since at any given point of time, a
> partition
> > > is only owned by one consumer, there won't be conflicts on updating the
> > > offsets. More details are described in the "consumer rebalancing
> > algorithm"
> > > section of http://kafka.apache.org/07/design.html
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin  > > >wrote:
> > >
> > > > Jun, can you explain this a little better? I thought when using
> > Consumer
> > > > Groups that on startup Kafka connects to ZooKeeper and finds the last
> > > read
> > > > offset for every partition in the topic being requested for the
> group.
> > > That
> > > > is then the starting point for the consumer threads.
> > > >
> > > > If a second process starts while the first one is running with the
> same
> > > > Consumer Group, won't the second one read the last offsets consumed
> by
> > > the
> > > > already running process and start processing from there? Then as the
> > > first
> > > > process syncs consumed offsets, won't the 2nd process's next update
> > > > overwrite them?
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao  wrote:
> > > >
> > > > > Chris,
> > > > >
> > > > > Thanks for the writeup. Looks great overall. A couple of comments.
> > > > >
> > > > > 1. At the beginning, it sounds like that one can't run multiple
> > > processes
> > > > > of consumers in the same group. This is actually not true. We can
> > > create
> > > > > multiple instances of consumers for the same group in the same JVM
> or
> > > > > different JVMs. The consumers will auto-balance among themselves.
> > > > >
> > > > > 2. We have changed the name of some config properties.
> > > > > auto.commit.interval.ms is correct. However, zk.connect,
> > > > > zk.session.timeout.ms and zk.sync.time.ms are changed to
> > > > > zookeeper.connect,
> > > > > zookeeper.session.timeout.ms, and zookeeper.sync.time.ms,
> > > respectively.
> > > > >
> > > > > I will add a link to your wiki in our website.
> > > > >
> > > > > Thanks again.
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin <
> > curtin.ch...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > I finished and published it this morning:
> > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > > > >
> > > > > > One question: when documenting the ConsumerConfig parameters I
> > > couldn't
> > > > > > find a description for the 'auto.commit.interval.ms' setting. I
> > > found
> > > > > one
> > > > > > for 'autocommit.interval.ms' (no '.' between auto and commit) in
> > the
> > > > > > Google
> > > > > > Cache only. Which spelling is it? Also is my description of it
> > > correct?
> > > > > >
> > > > > > I'll take a look at custom encoders later this week. Today and
> > > Tuesday
> > > > > are
> > > > > > going to be pretty busy.
> > > > > >
> > > > > > Please let me know if there are changes needed to the High Level
> > > > Consumer
> > > > > > page.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao 
> > wrote:
> > > > > >
> > > > > > > Chris,
> > > > > > >
> > > > > > > Any update of the high level consumer example?
> > > > > > >
> > > > > > > Also, in the Producer example, it would be useful to describe
> how
> > > to
> > > > > > write
> > > > > > > a customized encoder. One subtle thing is that the encoder
> needs
> > a
> > > > > > > constructor that takes a a single VerifiableProperties
> argument (
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-869).
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > >

Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-29 Thread Jun Rao
Thanks. I also updated your producer example to reflect a recent config
change (broker.list => metadata.broker.list).

Jun


On Mon, Apr 29, 2013 at 11:19 AM, Chris Curtin wrote:

> Thanks, I missed that the addition of consumers can cause a re-balance.
> Thought it was only on Leader changes.
>
> I've updated the wording in the example.
>
> I'll pull down the beta and test my application then change the names on
> the properties.
>
> Thanks,
>
> Chris
>
>
> On Mon, Apr 29, 2013 at 11:58 AM, Jun Rao  wrote:
>
> > Basically, every time a consumer joins a group, every consumer in the
> > groups gets a ZK notification and each of them tries to own a subset of
> the
> > total number of partitions. A given partition is only assigned to one of
> > the consumers in the same group. Once the ownership is determined, each
> > consumer consumes messages coming from its partitions and manages the
> > offset of those partitions. Since at any given point of time, a partition
> > is only owned by one consumer, there won't be conflicts on updating the
> > offsets. More details are described in the "consumer rebalancing
> algorithm"
> > section of http://kafka.apache.org/07/design.html
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin  > >wrote:
> >
> > > Jun, can you explain this a little better? I thought when using
> Consumer
> > > Groups that on startup Kafka connects to ZooKeeper and finds the last
> > read
> > > offset for every partition in the topic being requested for the group.
> > That
> > > is then the starting point for the consumer threads.
> > >
> > > If a second process starts while the first one is running with the same
> > > Consumer Group, won't the second one read the last offsets consumed by
> > the
> > > already running process and start processing from there? Then as the
> > first
> > > process syncs consumed offsets, won't the 2nd process's next update
> > > overwrite them?
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > >
> > >
> > > On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao  wrote:
> > >
> > > > Chris,
> > > >
> > > > Thanks for the writeup. Looks great overall. A couple of comments.
> > > >
> > > > 1. At the beginning, it sounds like that one can't run multiple
> > processes
> > > > of consumers in the same group. This is actually not true. We can
> > create
> > > > multiple instances of consumers for the same group in the same JVM or
> > > > different JVMs. The consumers will auto-balance among themselves.
> > > >
> > > > 2. We have changed the name of some config properties.
> > > > auto.commit.interval.ms is correct. However, zk.connect,
> > > > zk.session.timeout.ms and zk.sync.time.ms are changed to
> > > > zookeeper.connect,
> > > > zookeeper.session.timeout.ms, and zookeeper.sync.time.ms,
> > respectively.
> > > >
> > > > I will add a link to your wiki in our website.
> > > >
> > > > Thanks again.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin <
> curtin.ch...@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > I finished and published it this morning:
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > > >
> > > > > One question: when documenting the ConsumerConfig parameters I
> > couldn't
> > > > > find a description for the 'auto.commit.interval.ms' setting. I
> > found
> > > > one
> > > > > for 'autocommit.interval.ms' (no '.' between auto and commit) in
> the
> > > > > Google
> > > > > Cache only. Which spelling is it? Also is my description of it
> > correct?
> > > > >
> > > > > I'll take a look at custom encoders later this week. Today and
> > Tuesday
> > > > are
> > > > > going to be pretty busy.
> > > > >
> > > > > Please let me know if there are changes needed to the High Level
> > > Consumer
> > > > > page.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Chris
> > > > >
> > > > >
> > > > > On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao 
> wrote:
> > > > >
> > > > > > Chris,
> > > > > >
> > > > > > Any update of the high level consumer example?
> > > > > >
> > > > > > Also, in the Producer example, it would be useful to describe how
> > to
> > > > > write
> > > > > > a customized encoder. One subtle thing is that the encoder needs
> a
> > > > > > constructor that takes a a single VerifiableProperties argument (
> > > > > > https://issues.apache.org/jira/browse/KAFKA-869).
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-29 Thread Chris Curtin
Thanks, I missed that the addition of consumers can cause a re-balance.
Thought it was only on Leader changes.

I've updated the wording in the example.

I'll pull down the beta and test my application then change the names on
the properties.

Thanks,

Chris


On Mon, Apr 29, 2013 at 11:58 AM, Jun Rao  wrote:

> Basically, every time a consumer joins a group, every consumer in the
> groups gets a ZK notification and each of them tries to own a subset of the
> total number of partitions. A given partition is only assigned to one of
> the consumers in the same group. Once the ownership is determined, each
> consumer consumes messages coming from its partitions and manages the
> offset of those partitions. Since at any given point of time, a partition
> is only owned by one consumer, there won't be conflicts on updating the
> offsets. More details are described in the "consumer rebalancing algorithm"
> section of http://kafka.apache.org/07/design.html
>
> Thanks,
>
> Jun
>
>
> On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin  >wrote:
>
> > Jun, can you explain this a little better? I thought when using Consumer
> > Groups that on startup Kafka connects to ZooKeeper and finds the last
> read
> > offset for every partition in the topic being requested for the group.
> That
> > is then the starting point for the consumer threads.
> >
> > If a second process starts while the first one is running with the same
> > Consumer Group, won't the second one read the last offsets consumed by
> the
> > already running process and start processing from there? Then as the
> first
> > process syncs consumed offsets, won't the 2nd process's next update
> > overwrite them?
> >
> > Thanks,
> >
> > Chris
> >
> >
> >
> >
> > On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao  wrote:
> >
> > > Chris,
> > >
> > > Thanks for the writeup. Looks great overall. A couple of comments.
> > >
> > > 1. At the beginning, it sounds like that one can't run multiple
> processes
> > > of consumers in the same group. This is actually not true. We can
> create
> > > multiple instances of consumers for the same group in the same JVM or
> > > different JVMs. The consumers will auto-balance among themselves.
> > >
> > > 2. We have changed the name of some config properties.
> > > auto.commit.interval.ms is correct. However, zk.connect,
> > > zk.session.timeout.ms and zk.sync.time.ms are changed to
> > > zookeeper.connect,
> > > zookeeper.session.timeout.ms, and zookeeper.sync.time.ms,
> respectively.
> > >
> > > I will add a link to your wiki in our website.
> > >
> > > Thanks again.
> > >
> > > Jun
> > >
> > >
> > > On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin  > > >wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > I finished and published it this morning:
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > >
> > > > One question: when documenting the ConsumerConfig parameters I
> couldn't
> > > > find a description for the 'auto.commit.interval.ms' setting. I
> found
> > > one
> > > > for 'autocommit.interval.ms' (no '.' between auto and commit) in the
> > > > Google
> > > > Cache only. Which spelling is it? Also is my description of it
> correct?
> > > >
> > > > I'll take a look at custom encoders later this week. Today and
> Tuesday
> > > are
> > > > going to be pretty busy.
> > > >
> > > > Please let me know if there are changes needed to the High Level
> > Consumer
> > > > page.
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > > >
> > > > On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao  wrote:
> > > >
> > > > > Chris,
> > > > >
> > > > > Any update of the high level consumer example?
> > > > >
> > > > > Also, in the Producer example, it would be useful to describe how
> to
> > > > write
> > > > > a customized encoder. One subtle thing is that the encoder needs a
> > > > > constructor that takes a a single VerifiableProperties argument (
> > > > > https://issues.apache.org/jira/browse/KAFKA-869).
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-29 Thread Jun Rao
Basically, every time a consumer joins a group, every consumer in the
groups gets a ZK notification and each of them tries to own a subset of the
total number of partitions. A given partition is only assigned to one of
the consumers in the same group. Once the ownership is determined, each
consumer consumes messages coming from its partitions and manages the
offset of those partitions. Since at any given point of time, a partition
is only owned by one consumer, there won't be conflicts on updating the
offsets. More details are described in the "consumer rebalancing algorithm"
section of http://kafka.apache.org/07/design.html

Thanks,

Jun


On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin wrote:

> Jun, can you explain this a little better? I thought when using Consumer
> Groups that on startup Kafka connects to ZooKeeper and finds the last read
> offset for every partition in the topic being requested for the group. That
> is then the starting point for the consumer threads.
>
> If a second process starts while the first one is running with the same
> Consumer Group, won't the second one read the last offsets consumed by the
> already running process and start processing from there? Then as the first
> process syncs consumed offsets, won't the 2nd process's next update
> overwrite them?
>
> Thanks,
>
> Chris
>
>
>
>
> On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao  wrote:
>
> > Chris,
> >
> > Thanks for the writeup. Looks great overall. A couple of comments.
> >
> > 1. At the beginning, it sounds like that one can't run multiple processes
> > of consumers in the same group. This is actually not true. We can create
> > multiple instances of consumers for the same group in the same JVM or
> > different JVMs. The consumers will auto-balance among themselves.
> >
> > 2. We have changed the name of some config properties.
> > auto.commit.interval.ms is correct. However, zk.connect,
> > zk.session.timeout.ms and zk.sync.time.ms are changed to
> > zookeeper.connect,
> > zookeeper.session.timeout.ms, and zookeeper.sync.time.ms, respectively.
> >
> > I will add a link to your wiki in our website.
> >
> > Thanks again.
> >
> > Jun
> >
> >
> > On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin  > >wrote:
> >
> > > Hi Jun,
> > >
> > > I finished and published it this morning:
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > >
> > > One question: when documenting the ConsumerConfig parameters I couldn't
> > > find a description for the 'auto.commit.interval.ms' setting. I found
> > one
> > > for 'autocommit.interval.ms' (no '.' between auto and commit) in the
> > > Google
> > > Cache only. Which spelling is it? Also is my description of it correct?
> > >
> > > I'll take a look at custom encoders later this week. Today and Tuesday
> > are
> > > going to be pretty busy.
> > >
> > > Please let me know if there are changes needed to the High Level
> Consumer
> > > page.
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao  wrote:
> > >
> > > > Chris,
> > > >
> > > > Any update of the high level consumer example?
> > > >
> > > > Also, in the Producer example, it would be useful to describe how to
> > > write
> > > > a customized encoder. One subtle thing is that the encoder needs a
> > > > constructor that takes a a single VerifiableProperties argument (
> > > > https://issues.apache.org/jira/browse/KAFKA-869).
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-29 Thread Chris Curtin
Jun, can you explain this a little better? I thought when using Consumer
Groups that on startup Kafka connects to ZooKeeper and finds the last read
offset for every partition in the topic being requested for the group. That
is then the starting point for the consumer threads.

If a second process starts while the first one is running with the same
Consumer Group, won't the second one read the last offsets consumed by the
already running process and start processing from there? Then as the first
process syncs consumed offsets, won't the 2nd process's next update
overwrite them?

Thanks,

Chris




On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao  wrote:

> Chris,
>
> Thanks for the writeup. Looks great overall. A couple of comments.
>
> 1. At the beginning, it sounds like that one can't run multiple processes
> of consumers in the same group. This is actually not true. We can create
> multiple instances of consumers for the same group in the same JVM or
> different JVMs. The consumers will auto-balance among themselves.
>
> 2. We have changed the name of some config properties.
> auto.commit.interval.ms is correct. However, zk.connect,
> zk.session.timeout.ms and zk.sync.time.ms are changed to
> zookeeper.connect,
> zookeeper.session.timeout.ms, and zookeeper.sync.time.ms, respectively.
>
> I will add a link to your wiki in our website.
>
> Thanks again.
>
> Jun
>
>
> On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin  >wrote:
>
> > Hi Jun,
> >
> > I finished and published it this morning:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> >
> > One question: when documenting the ConsumerConfig parameters I couldn't
> > find a description for the 'auto.commit.interval.ms' setting. I found
> one
> > for 'autocommit.interval.ms' (no '.' between auto and commit) in the
> > Google
> > Cache only. Which spelling is it? Also is my description of it correct?
> >
> > I'll take a look at custom encoders later this week. Today and Tuesday
> are
> > going to be pretty busy.
> >
> > Please let me know if there are changes needed to the High Level Consumer
> > page.
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao  wrote:
> >
> > > Chris,
> > >
> > > Any update of the high level consumer example?
> > >
> > > Also, in the Producer example, it would be useful to describe how to
> > write
> > > a customized encoder. One subtle thing is that the encoder needs a
> > > constructor that takes a a single VerifiableProperties argument (
> > > https://issues.apache.org/jira/browse/KAFKA-869).
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-29 Thread Jun Rao
Chris,

Thanks for the writeup. Looks great overall. A couple of comments.

1. At the beginning, it sounds like that one can't run multiple processes
of consumers in the same group. This is actually not true. We can create
multiple instances of consumers for the same group in the same JVM or
different JVMs. The consumers will auto-balance among themselves.

2. We have changed the name of some config properties.
auto.commit.interval.ms is correct. However, zk.connect,
zk.session.timeout.ms and zk.sync.time.ms are changed to zookeeper.connect,
zookeeper.session.timeout.ms, and zookeeper.sync.time.ms, respectively.

I will add a link to your wiki in our website.

Thanks again.

Jun


On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin wrote:

> Hi Jun,
>
> I finished and published it this morning:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>
> One question: when documenting the ConsumerConfig parameters I couldn't
> find a description for the 'auto.commit.interval.ms' setting. I found one
> for 'autocommit.interval.ms' (no '.' between auto and commit) in the
> Google
> Cache only. Which spelling is it? Also is my description of it correct?
>
> I'll take a look at custom encoders later this week. Today and Tuesday are
> going to be pretty busy.
>
> Please let me know if there are changes needed to the High Level Consumer
> page.
>
> Thanks,
>
> Chris
>
>
> On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao  wrote:
>
> > Chris,
> >
> > Any update of the high level consumer example?
> >
> > Also, in the Producer example, it would be useful to describe how to
> write
> > a customized encoder. One subtle thing is that the encoder needs a
> > constructor that takes a a single VerifiableProperties argument (
> > https://issues.apache.org/jira/browse/KAFKA-869).
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-29 Thread Chris Curtin
Hi Jun,

I finished and published it this morning:

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

One question: when documenting the ConsumerConfig parameters I couldn't
find a description for the 'auto.commit.interval.ms' setting. I found one
for 'autocommit.interval.ms' (no '.' between auto and commit) in the Google
Cache only. Which spelling is it? Also is my description of it correct?

I'll take a look at custom encoders later this week. Today and Tuesday are
going to be pretty busy.

Please let me know if there are changes needed to the High Level Consumer
page.

Thanks,

Chris


On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao  wrote:

> Chris,
>
> Any update of the high level consumer example?
>
> Also, in the Producer example, it would be useful to describe how to write
> a customized encoder. One subtle thing is that the encoder needs a
> constructor that takes a a single VerifiableProperties argument (
> https://issues.apache.org/jira/browse/KAFKA-869).
>
> Thanks,
>
> Jun
>
>
>
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-28 Thread Jun Rao
Chris,

Any update of the high level consumer example?

Also, in the Producer example, it would be useful to describe how to write
a customized encoder. One subtle thing is that the encoder needs a
constructor that takes a a single VerifiableProperties argument (
https://issues.apache.org/jira/browse/KAFKA-869).

Thanks,

Jun


On Mon, Apr 22, 2013 at 10:40 AM, Chris Curtin wrote:

> Hi Jun,
>
> #1 and #2 are done, thanks for the code-review!
>
> I'll work on getting a High Level consumer example this week. I don't have
> one readily usable (we quickly found the lack of control over offsets
> didn't meet our needs) but I can get something this week.
>
> Congratulations on getting closer to Beta!
>
> Chris
>
>
> On Mon, Apr 22, 2013 at 12:26 PM, Jun Rao  wrote:
>
> > Chris,
> >
> > Thanks for the wiki. We are getting close to releasing 0.8.0 beta and
> your
> > writeup is very helpful. The following are some comments for the 0.8
> > Producer wiki.
> >
> > 1. The following sentence is inaccurate. The producer will do random
> > assignment as long as the key in KeyedMessage is null. If a key is not
> > null, it will use the default partitioner if partitioner.class is not
> set.
> > By default if you don't include a partitioner.class Kafka will randomly
> > assign the message to a partition.
> >
> > 2. In the following sentence, the first type is the key and the second
> type
> > is the message.
> > The first is the type of the message, the second the type of the
> Partition
> > key.
> >
> > 3. Could you explain the "key.serializer.class" property too?
> >
> > In addition to the 0.8 SimpleConsumer wiki, could you write up one for
> the
> > 0.8 high level consumer?
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Fri, Mar 29, 2013 at 8:28 AM, Chris Curtin  > >wrote:
> >
> > > Hi,
> > >
> > > I've added an example program for using a SimpleConsumer for 0.8.0.
> Turns
> > > out to be a little more complicated once you add Broker failover. I'm
> not
> > > 100% thrilled with how I detect and recover, so if someone has a better
> > way
> > > of doing this please let me (and this list) know.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin  > > >wrote:
> > >
> > > >
> > > > Hi David,
> > > >
> > > > Thanks for the feedback. I've seen the example before and after in
> > > > different books/articles and it doesn't matter to me.
> > > >
> > > > Anyone else want to help define a style guide or is there one I
> didn't
> > > see
> > > > already?
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > > >
> > > > On Thu, Mar 21, 2013 at 7:46 PM, David Arthur 
> > wrote:
> > > >
> > > >> This looks great! A few comments
> > > >>
> > > >> * I think it would be useful to start with a complete example (ready
> > to
> > > >> copy/paste) and then break it down bit by bit
> > > >> * Some of the formatting is funky (gratuitous newlines), also I
> think
> > 2
> > > >> spaces looks nicer than 4
> > > >> * In the text, it might be useful to embolden or italicize class
> names
> > > >>
> > > >> Also, maybe we should move this to a separate thread?
> > > >>
> > > >
> > > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-24 Thread Jun Rao
For #5, if you always start from an offset returned by getOffsetBefore, it
won't happen since getOffsetBefore will always return offset at the
compressed messageSet boundary. However, if you start consuming from an
arbitrary offset, you may see this.

Thanks,

Jun


On Wed, Apr 24, 2013 at 11:12 AM, Chris Curtin wrote:

> Hi Jun,
>
> I've made some of the changes:
>
> #1 - was doing this in the leader identification, but not on startup. I've
> cleaned that up
>
> #2 - thoughts on how to word this comment? I'm not sure how to point out
> not to do something we didn't do :)
>
> #3 Fixed
>
> #4 I'll need to spend a bunch more time refactoring this than I have right
> now. The point about the different ports isn't something I considered since
> our DevOps guys build everything from recipes so they are always the same,
> but I can see how in a non-cookie cutter world that could happen.
>
> #5 Good to know. I've updated the example. However I couldn't reproduce
> this even with a fairly small fetch buffer so I'm not 100% sure the check
> is correct. Can you take a look and make sure I don't have an off-by-one
> error? Or suggest how to make it happen?
>
> Thanks,
>
> Chris
>
>
> On Wed, Apr 24, 2013 at 12:26 PM, Jun Rao  wrote:
>
> > Chris,
> >
> > The following are some comments on the SimpleConsumer wiki.
> >
> > 1. PartitionMetadata.leader() can return null if the new leader is not
> > elected yet. We need to handle that.
> > 2. When using FetchRequestBuilder, it's important NOT to set replicaId
> > since this is only meant to be used by fetchers in the follower replicas.
> > Setting replicaId incorrectly will cause the broker to behave strangely.
> We
> > didn't do that in the code, but it would be useful to add a comment to
> > highlight this.
> > 3. The following code doesn't match the comment. The comment says
> resetting
> > to the last offset, but the code resets to the first offset.
> >
> > // We asked for an invalid offset. For simple case
> > ask for the last element to resetreadOffset =
> > getLastOffset(consumer,a_topic, a_partition,
> > kafka.api.OffsetRequest.EarliestTime(), clientName);
> >
> > 4. It seems that we can combine findNewLeader() and fineLeader() into one
> > method. In particular, the port of the new leader may not be the same as
> > the old one.
> > 5. There is one more surprise when iterating messages in a messageSet
> > retuned in a fetch response. In general, if the fetch offset is X, you
> may
> > get messages with offsets less than X in the returned messageSet. This is
> > because in 0.8, each message in a compressed unit has its own offset. So,
> > if a compressed messageSet contains 10 messages with offsets from 0 to 9
> > and the client wants to fetch from offset 5, the broker is going to
> return
> > all 10 messages. So, it's the responsibility of the client to skip the
> > first 5 messages with offsets from 0 to 4 and only consume messages with
> > offset 5 and above. Otherwise, the client may get duplicates. This is
> > handled in the high level consumer and we need to handle that in
> > SimpleConsumer as well.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Fri, Mar 29, 2013 at 8:28 AM, Chris Curtin  > >wrote:
> >
> > > Hi,
> > >
> > > I've added an example program for using a SimpleConsumer for 0.8.0.
> Turns
> > > out to be a little more complicated once you add Broker failover. I'm
> not
> > > 100% thrilled with how I detect and recover, so if someone has a better
> > way
> > > of doing this please let me (and this list) know.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin  > > >wrote:
> > >
> > > >
> > > > Hi David,
> > > >
> > > > Thanks for the feedback. I've seen the example before and after in
> > > > different books/articles and it doesn't matter to me.
> > > >
> > > > Anyone else want to help define a style guide or is there one I
> didn't
> > > see
> > > > already?
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > > >
> > > > On Thu, Mar 21, 2013 at 7:46 PM, David Arthur 
> > wrote:
> > > >
> > > >> This looks great! A few comments
> > > >>
> > > >> * I think it would be useful to start with a complete example (ready
> > to
> > > >> copy/paste) and then break it down bit by bit
> > > >> * Some of the formatting is funky (gratuitous newlines), also I
> think
> > 2
> > > >> spaces looks nicer than 4
> > > >> * In the text, it might be useful to embolden or italicize class
> names
> > > >>
> > > >> Also, maybe we should move this to a separate thread?
> > > >>
> > > >
> > > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-24 Thread Chris Curtin
Hi Jun,

I've made some of the changes:

#1 - was doing this in the leader identification, but not on startup. I've
cleaned that up

#2 - thoughts on how to word this comment? I'm not sure how to point out
not to do something we didn't do :)

#3 Fixed

#4 I'll need to spend a bunch more time refactoring this than I have right
now. The point about the different ports isn't something I considered since
our DevOps guys build everything from recipes so they are always the same,
but I can see how in a non-cookie cutter world that could happen.

#5 Good to know. I've updated the example. However I couldn't reproduce
this even with a fairly small fetch buffer so I'm not 100% sure the check
is correct. Can you take a look and make sure I don't have an off-by-one
error? Or suggest how to make it happen?

Thanks,

Chris


On Wed, Apr 24, 2013 at 12:26 PM, Jun Rao  wrote:

> Chris,
>
> The following are some comments on the SimpleConsumer wiki.
>
> 1. PartitionMetadata.leader() can return null if the new leader is not
> elected yet. We need to handle that.
> 2. When using FetchRequestBuilder, it's important NOT to set replicaId
> since this is only meant to be used by fetchers in the follower replicas.
> Setting replicaId incorrectly will cause the broker to behave strangely. We
> didn't do that in the code, but it would be useful to add a comment to
> highlight this.
> 3. The following code doesn't match the comment. The comment says resetting
> to the last offset, but the code resets to the first offset.
>
> // We asked for an invalid offset. For simple case
> ask for the last element to resetreadOffset =
> getLastOffset(consumer,a_topic, a_partition,
> kafka.api.OffsetRequest.EarliestTime(), clientName);
>
> 4. It seems that we can combine findNewLeader() and fineLeader() into one
> method. In particular, the port of the new leader may not be the same as
> the old one.
> 5. There is one more surprise when iterating messages in a messageSet
> retuned in a fetch response. In general, if the fetch offset is X, you may
> get messages with offsets less than X in the returned messageSet. This is
> because in 0.8, each message in a compressed unit has its own offset. So,
> if a compressed messageSet contains 10 messages with offsets from 0 to 9
> and the client wants to fetch from offset 5, the broker is going to return
> all 10 messages. So, it's the responsibility of the client to skip the
> first 5 messages with offsets from 0 to 4 and only consume messages with
> offset 5 and above. Otherwise, the client may get duplicates. This is
> handled in the high level consumer and we need to handle that in
> SimpleConsumer as well.
>
> Thanks,
>
> Jun
>
>
>
> On Fri, Mar 29, 2013 at 8:28 AM, Chris Curtin  >wrote:
>
> > Hi,
> >
> > I've added an example program for using a SimpleConsumer for 0.8.0. Turns
> > out to be a little more complicated once you add Broker failover. I'm not
> > 100% thrilled with how I detect and recover, so if someone has a better
> way
> > of doing this please let me (and this list) know.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin  > >wrote:
> >
> > >
> > > Hi David,
> > >
> > > Thanks for the feedback. I've seen the example before and after in
> > > different books/articles and it doesn't matter to me.
> > >
> > > Anyone else want to help define a style guide or is there one I didn't
> > see
> > > already?
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Thu, Mar 21, 2013 at 7:46 PM, David Arthur 
> wrote:
> > >
> > >> This looks great! A few comments
> > >>
> > >> * I think it would be useful to start with a complete example (ready
> to
> > >> copy/paste) and then break it down bit by bit
> > >> * Some of the formatting is funky (gratuitous newlines), also I think
> 2
> > >> spaces looks nicer than 4
> > >> * In the text, it might be useful to embolden or italicize class names
> > >>
> > >> Also, maybe we should move this to a separate thread?
> > >>
> > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-24 Thread Jun Rao
Chris,

The following are some comments on the SimpleConsumer wiki.

1. PartitionMetadata.leader() can return null if the new leader is not
elected yet. We need to handle that.
2. When using FetchRequestBuilder, it's important NOT to set replicaId
since this is only meant to be used by fetchers in the follower replicas.
Setting replicaId incorrectly will cause the broker to behave strangely. We
didn't do that in the code, but it would be useful to add a comment to
highlight this.
3. The following code doesn't match the comment. The comment says resetting
to the last offset, but the code resets to the first offset.

// We asked for an invalid offset. For simple case
ask for the last element to resetreadOffset =
getLastOffset(consumer,a_topic, a_partition,
kafka.api.OffsetRequest.EarliestTime(), clientName);

4. It seems that we can combine findNewLeader() and fineLeader() into one
method. In particular, the port of the new leader may not be the same as
the old one.
5. There is one more surprise when iterating messages in a messageSet
retuned in a fetch response. In general, if the fetch offset is X, you may
get messages with offsets less than X in the returned messageSet. This is
because in 0.8, each message in a compressed unit has its own offset. So,
if a compressed messageSet contains 10 messages with offsets from 0 to 9
and the client wants to fetch from offset 5, the broker is going to return
all 10 messages. So, it's the responsibility of the client to skip the
first 5 messages with offsets from 0 to 4 and only consume messages with
offset 5 and above. Otherwise, the client may get duplicates. This is
handled in the high level consumer and we need to handle that in
SimpleConsumer as well.

Thanks,

Jun



On Fri, Mar 29, 2013 at 8:28 AM, Chris Curtin wrote:

> Hi,
>
> I've added an example program for using a SimpleConsumer for 0.8.0. Turns
> out to be a little more complicated once you add Broker failover. I'm not
> 100% thrilled with how I detect and recover, so if someone has a better way
> of doing this please let me (and this list) know.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
>
> Thanks,
>
> Chris
>
>
> On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin  >wrote:
>
> >
> > Hi David,
> >
> > Thanks for the feedback. I've seen the example before and after in
> > different books/articles and it doesn't matter to me.
> >
> > Anyone else want to help define a style guide or is there one I didn't
> see
> > already?
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Thu, Mar 21, 2013 at 7:46 PM, David Arthur  wrote:
> >
> >> This looks great! A few comments
> >>
> >> * I think it would be useful to start with a complete example (ready to
> >> copy/paste) and then break it down bit by bit
> >> * Some of the formatting is funky (gratuitous newlines), also I think 2
> >> spaces looks nicer than 4
> >> * In the text, it might be useful to embolden or italicize class names
> >>
> >> Also, maybe we should move this to a separate thread?
> >>
> >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-22 Thread Jun Rao
Chris,

Thanks. Once your high level producer example is ready, I will update the
0.8 quick start page (KAFKA-836) with the new link.

Jun


On Mon, Apr 22, 2013 at 10:40 AM, Chris Curtin wrote:

> Hi Jun,
>
> #1 and #2 are done, thanks for the code-review!
>
> I'll work on getting a High Level consumer example this week. I don't have
> one readily usable (we quickly found the lack of control over offsets
> didn't meet our needs) but I can get something this week.
>
> Congratulations on getting closer to Beta!
>
> Chris
>
>
> On Mon, Apr 22, 2013 at 12:26 PM, Jun Rao  wrote:
>
> > Chris,
> >
> > Thanks for the wiki. We are getting close to releasing 0.8.0 beta and
> your
> > writeup is very helpful. The following are some comments for the 0.8
> > Producer wiki.
> >
> > 1. The following sentence is inaccurate. The producer will do random
> > assignment as long as the key in KeyedMessage is null. If a key is not
> > null, it will use the default partitioner if partitioner.class is not
> set.
> > By default if you don't include a partitioner.class Kafka will randomly
> > assign the message to a partition.
> >
> > 2. In the following sentence, the first type is the key and the second
> type
> > is the message.
> > The first is the type of the message, the second the type of the
> Partition
> > key.
> >
> > 3. Could you explain the "key.serializer.class" property too?
> >
> > In addition to the 0.8 SimpleConsumer wiki, could you write up one for
> the
> > 0.8 high level consumer?
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Fri, Mar 29, 2013 at 8:28 AM, Chris Curtin  > >wrote:
> >
> > > Hi,
> > >
> > > I've added an example program for using a SimpleConsumer for 0.8.0.
> Turns
> > > out to be a little more complicated once you add Broker failover. I'm
> not
> > > 100% thrilled with how I detect and recover, so if someone has a better
> > way
> > > of doing this please let me (and this list) know.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin  > > >wrote:
> > >
> > > >
> > > > Hi David,
> > > >
> > > > Thanks for the feedback. I've seen the example before and after in
> > > > different books/articles and it doesn't matter to me.
> > > >
> > > > Anyone else want to help define a style guide or is there one I
> didn't
> > > see
> > > > already?
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > >
> > > >
> > > > On Thu, Mar 21, 2013 at 7:46 PM, David Arthur 
> > wrote:
> > > >
> > > >> This looks great! A few comments
> > > >>
> > > >> * I think it would be useful to start with a complete example (ready
> > to
> > > >> copy/paste) and then break it down bit by bit
> > > >> * Some of the formatting is funky (gratuitous newlines), also I
> think
> > 2
> > > >> spaces looks nicer than 4
> > > >> * In the text, it might be useful to embolden or italicize class
> names
> > > >>
> > > >> Also, maybe we should move this to a separate thread?
> > > >>
> > > >
> > > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-22 Thread Chris Curtin
Hi Jun,

#1 and #2 are done, thanks for the code-review!

I'll work on getting a High Level consumer example this week. I don't have
one readily usable (we quickly found the lack of control over offsets
didn't meet our needs) but I can get something this week.

Congratulations on getting closer to Beta!

Chris


On Mon, Apr 22, 2013 at 12:26 PM, Jun Rao  wrote:

> Chris,
>
> Thanks for the wiki. We are getting close to releasing 0.8.0 beta and your
> writeup is very helpful. The following are some comments for the 0.8
> Producer wiki.
>
> 1. The following sentence is inaccurate. The producer will do random
> assignment as long as the key in KeyedMessage is null. If a key is not
> null, it will use the default partitioner if partitioner.class is not set.
> By default if you don't include a partitioner.class Kafka will randomly
> assign the message to a partition.
>
> 2. In the following sentence, the first type is the key and the second type
> is the message.
> The first is the type of the message, the second the type of the Partition
> key.
>
> 3. Could you explain the "key.serializer.class" property too?
>
> In addition to the 0.8 SimpleConsumer wiki, could you write up one for the
> 0.8 high level consumer?
>
> Thanks,
>
> Jun
>
>
>
> On Fri, Mar 29, 2013 at 8:28 AM, Chris Curtin  >wrote:
>
> > Hi,
> >
> > I've added an example program for using a SimpleConsumer for 0.8.0. Turns
> > out to be a little more complicated once you add Broker failover. I'm not
> > 100% thrilled with how I detect and recover, so if someone has a better
> way
> > of doing this please let me (and this list) know.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin  > >wrote:
> >
> > >
> > > Hi David,
> > >
> > > Thanks for the feedback. I've seen the example before and after in
> > > different books/articles and it doesn't matter to me.
> > >
> > > Anyone else want to help define a style guide or is there one I didn't
> > see
> > > already?
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Thu, Mar 21, 2013 at 7:46 PM, David Arthur 
> wrote:
> > >
> > >> This looks great! A few comments
> > >>
> > >> * I think it would be useful to start with a complete example (ready
> to
> > >> copy/paste) and then break it down bit by bit
> > >> * Some of the formatting is funky (gratuitous newlines), also I think
> 2
> > >> spaces looks nicer than 4
> > >> * In the text, it might be useful to embolden or italicize class names
> > >>
> > >> Also, maybe we should move this to a separate thread?
> > >>
> > >
> > >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-04-22 Thread Jun Rao
Chris,

Thanks for the wiki. We are getting close to releasing 0.8.0 beta and your
writeup is very helpful. The following are some comments for the 0.8
Producer wiki.

1. The following sentence is inaccurate. The producer will do random
assignment as long as the key in KeyedMessage is null. If a key is not
null, it will use the default partitioner if partitioner.class is not set.
By default if you don't include a partitioner.class Kafka will randomly
assign the message to a partition.

2. In the following sentence, the first type is the key and the second type
is the message.
The first is the type of the message, the second the type of the Partition
key.

3. Could you explain the "key.serializer.class" property too?

In addition to the 0.8 SimpleConsumer wiki, could you write up one for the
0.8 high level consumer?

Thanks,

Jun



On Fri, Mar 29, 2013 at 8:28 AM, Chris Curtin wrote:

> Hi,
>
> I've added an example program for using a SimpleConsumer for 0.8.0. Turns
> out to be a little more complicated once you add Broker failover. I'm not
> 100% thrilled with how I detect and recover, so if someone has a better way
> of doing this please let me (and this list) know.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
>
> Thanks,
>
> Chris
>
>
> On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin  >wrote:
>
> >
> > Hi David,
> >
> > Thanks for the feedback. I've seen the example before and after in
> > different books/articles and it doesn't matter to me.
> >
> > Anyone else want to help define a style guide or is there one I didn't
> see
> > already?
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Thu, Mar 21, 2013 at 7:46 PM, David Arthur  wrote:
> >
> >> This looks great! A few comments
> >>
> >> * I think it would be useful to start with a complete example (ready to
> >> copy/paste) and then break it down bit by bit
> >> * Some of the formatting is funky (gratuitous newlines), also I think 2
> >> spaces looks nicer than 4
> >> * In the text, it might be useful to embolden or italicize class names
> >>
> >> Also, maybe we should move this to a separate thread?
> >>
> >
> >
>


Re: Kafka wiki Documentation conventions - looking for feedback

2013-03-29 Thread Chris Curtin
Hi,

I've added an example program for using a SimpleConsumer for 0.8.0. Turns
out to be a little more complicated once you add Broker failover. I'm not
100% thrilled with how I detect and recover, so if someone has a better way
of doing this please let me (and this list) know.

https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

Thanks,

Chris


On Mon, Mar 25, 2013 at 8:02 AM, Chris Curtin wrote:

>
> Hi David,
>
> Thanks for the feedback. I've seen the example before and after in
> different books/articles and it doesn't matter to me.
>
> Anyone else want to help define a style guide or is there one I didn't see
> already?
>
> Thanks,
>
> Chris
>
>
> On Thu, Mar 21, 2013 at 7:46 PM, David Arthur  wrote:
>
>> This looks great! A few comments
>>
>> * I think it would be useful to start with a complete example (ready to
>> copy/paste) and then break it down bit by bit
>> * Some of the formatting is funky (gratuitous newlines), also I think 2
>> spaces looks nicer than 4
>> * In the text, it might be useful to embolden or italicize class names
>>
>> Also, maybe we should move this to a separate thread?
>>
>
>