Re: Partitioning and scale

2013-05-24 Thread Neha Narkhede
Timothy,

Kafka is not designed to support millions of topics. Zookeeper will become
a bottleneck, even if you deploy more brokers to get around the # of files
issue. In normal cases, it might work just fine with the right sized
cluster. However, when there are failures, the time to recovery could be a
few minutes instead of a few 100s of ms or few seconds.

Also, even if you go with one topic per session id approach, it will be
unscalable when the key space increases in the future. A more scalable
approach is what Milind described. Use the sticky partitioning feature in
08 and have a session id always get routed to a particular partition. Then
each of your consumers can be sure that they will always receive data for a
subset of the session ids. So you can do any locality sensitive processing
on your consumers for processing user sessions.

Thanks,
Neha


On Thu, May 23, 2013 at 4:36 PM, Milind Parikh milindpar...@gmail.comwrote:

 Number of files to manage by os, I suppose.

 Why wouldn't you use consistent hashing with deliberately engineered
 collisions to generate a limited number of topics / partitions and filter
 at the consumer level?

 Regards
 Milind
 On May 23, 2013 4:22 PM, Timothy Chen tnac...@gmail.com wrote:

  Hi Neha,
 
  Not sure if this sounds crazy, but if we'd like to have the events for
 the
  same session id go to the same partition one way could be that each
 session
  key creates its own topic with single partition, therefore there could be
  millions of topic with single partition.
 
  I wonder what would be the bottleneck of doing this?
 
  Thanks,
 
  Tim
 
 
  On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   Not automatically as of today. You have to run the reassign-partitions
  tool
   and explicitly move selected partitions to the new brokers. If you use
  this
   tool, you can move partitions to the new broker without any downtime.
  
   Thanks,
   Neha
  
  
   On Wed, May 22, 2013 at 2:20 PM, Timothy Chen tnac...@gmail.com
 wrote:
  
Hi Neha/Chris,
   
Thanks for the reply, so if I set a fixed number of partitions and
 just
   add
brokers to the broker pool, does it rebalance the load to the new
  brokers
(along with the data)?
   
Tim
   
   
On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede 
  neha.narkh...@gmail.com
wrote:
   
 - I see that Kafka server.properties allows one to specify the
 number
   of
 partitions it supports. However, when we want to scale I wonder if
 we
add #
 of partitions or # of brokers, will the same partitioner start
distributing
 the messages to different partitions?
  And if it does, how can that same consumer continue to read off
 the
 messages of those ids if it was interrupted in the middle?

 The num.partitions config in server.properties is used only for
  topics
that
 are auto created (controlled by auto.create.topics.enable). For
  topics
that
 you create using the admin tool, you can specify the number of
   partitions
 that you want. After that, currently there is no way to change
 that.
   For
 that reason, it is a good idea to over partition your topic, which
  also
 helps load balance partitions onto the brokers. You are right that
 if
   you
 change the number of partitions later, then previously messages
 that
stuck
 to a certain partition would now get routed to a different
 partition,
which
 is undesirable for applications that want to use sticky
 partitioning.

 - I'd like to create a consumer per partition, and for each one to
 subscribe to the changes of that one. How can this be done in
 kafka?

 For your use case, it seems like SimpleConsumer might be a better
  fit.
 However, it will require you to write code to handle discovery of
   leader
 for the partition that your consumer is consuming. Chris has
 written
   up a
 great example that you can follow -


   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

 Thanks,
 Neha


 On Wed, May 22, 2013 at 12:37 PM, Chris Curtin 
  curtin.ch...@gmail.com
 wrote:

  Hi Tim,
 
 
  On Wed, May 22, 2013 at 3:25 PM, Timothy Chen tnac...@gmail.com
 
wrote:
 
   Hi,
  
   I'm currently trying to understand how Kafka (0.8) can scale
 with
   our
  usage
   pattern and how to setup the partitioning.
  
   We want to route the same messages belonging to the same id to
  the
same
   queue, so its consumer will able to consume all the messages of
   that
 id.
  
   My questions:
  
- From my understanding, in Kafka we would need to have a
 custom
   partitioner that routes the same messages to the same partition
right?
   I'm
   trying to find examples of writing this partitioner logic, but
 I
can't
  find
   any. Can 

Re: Partitioning and scale

2013-05-23 Thread Timothy Chen
Hi Neha,

Not sure if this sounds crazy, but if we'd like to have the events for the
same session id go to the same partition one way could be that each session
key creates its own topic with single partition, therefore there could be
millions of topic with single partition.

I wonder what would be the bottleneck of doing this?

Thanks,

Tim


On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede neha.narkh...@gmail.comwrote:

 Not automatically as of today. You have to run the reassign-partitions tool
 and explicitly move selected partitions to the new brokers. If you use this
 tool, you can move partitions to the new broker without any downtime.

 Thanks,
 Neha


 On Wed, May 22, 2013 at 2:20 PM, Timothy Chen tnac...@gmail.com wrote:

  Hi Neha/Chris,
 
  Thanks for the reply, so if I set a fixed number of partitions and just
 add
  brokers to the broker pool, does it rebalance the load to the new brokers
  (along with the data)?
 
  Tim
 
 
  On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   - I see that Kafka server.properties allows one to specify the number
 of
   partitions it supports. However, when we want to scale I wonder if we
  add #
   of partitions or # of brokers, will the same partitioner start
  distributing
   the messages to different partitions?
And if it does, how can that same consumer continue to read off the
   messages of those ids if it was interrupted in the middle?
  
   The num.partitions config in server.properties is used only for topics
  that
   are auto created (controlled by auto.create.topics.enable). For topics
  that
   you create using the admin tool, you can specify the number of
 partitions
   that you want. After that, currently there is no way to change that.
 For
   that reason, it is a good idea to over partition your topic, which also
   helps load balance partitions onto the brokers. You are right that if
 you
   change the number of partitions later, then previously messages that
  stuck
   to a certain partition would now get routed to a different partition,
  which
   is undesirable for applications that want to use sticky partitioning.
  
   - I'd like to create a consumer per partition, and for each one to
   subscribe to the changes of that one. How can this be done in kafka?
  
   For your use case, it seems like SimpleConsumer might be a better fit.
   However, it will require you to write code to handle discovery of
 leader
   for the partition that your consumer is consuming. Chris has written
 up a
   great example that you can follow -
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
  
   Thanks,
   Neha
  
  
   On Wed, May 22, 2013 at 12:37 PM, Chris Curtin curtin.ch...@gmail.com
   wrote:
  
Hi Tim,
   
   
On Wed, May 22, 2013 at 3:25 PM, Timothy Chen tnac...@gmail.com
  wrote:
   
 Hi,

 I'm currently trying to understand how Kafka (0.8) can scale with
 our
usage
 pattern and how to setup the partitioning.

 We want to route the same messages belonging to the same id to the
  same
 queue, so its consumer will able to consume all the messages of
 that
   id.

 My questions:

  - From my understanding, in Kafka we would need to have a custom
 partitioner that routes the same messages to the same partition
  right?
 I'm
 trying to find examples of writing this partitioner logic, but I
  can't
find
 any. Can someone point me to an example?


  
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
   
The partitioner here does a simple mod on the IP address and the # of
partitions. You'd need to define your own logic, but this is a start.
   
   
 - I see that Kafka server.properties allows one to specify the
 number
   of
 partitions it supports. However, when we want to scale I wonder if
 we
add #
 of partitions or # of brokers, will the same partitioner start
distributing
 the messages to different partitions?
  And if it does, how can that same consumer continue to read off
 the
 messages of those ids if it was interrupted in the middle?

   
I'll let someone else answer this.
   
   

 - I'd like to create a consumer per partition, and for each one to
 subscribe to the changes of that one. How can this be done in
 kafka?

   
Two ways: Simple Consumer or Consumer Groups:
   
Depends on the level of control you want on code processing a
 specific
partition vs. getting one assigned to it (and level of control over
   offset
management).
   
   
  https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
   
   
   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

  
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
  
   
   

 Thanks,

 Tim

   
  
 



Re: Partitioning and scale

2013-05-23 Thread Milind Parikh
Number of files to manage by os, I suppose.

Why wouldn't you use consistent hashing with deliberately engineered
collisions to generate a limited number of topics / partitions and filter
at the consumer level?

Regards
Milind
On May 23, 2013 4:22 PM, Timothy Chen tnac...@gmail.com wrote:

 Hi Neha,

 Not sure if this sounds crazy, but if we'd like to have the events for the
 same session id go to the same partition one way could be that each session
 key creates its own topic with single partition, therefore there could be
 millions of topic with single partition.

 I wonder what would be the bottleneck of doing this?

 Thanks,

 Tim


 On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  Not automatically as of today. You have to run the reassign-partitions
 tool
  and explicitly move selected partitions to the new brokers. If you use
 this
  tool, you can move partitions to the new broker without any downtime.
 
  Thanks,
  Neha
 
 
  On Wed, May 22, 2013 at 2:20 PM, Timothy Chen tnac...@gmail.com wrote:
 
   Hi Neha/Chris,
  
   Thanks for the reply, so if I set a fixed number of partitions and just
  add
   brokers to the broker pool, does it rebalance the load to the new
 brokers
   (along with the data)?
  
   Tim
  
  
   On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede 
 neha.narkh...@gmail.com
   wrote:
  
- I see that Kafka server.properties allows one to specify the number
  of
partitions it supports. However, when we want to scale I wonder if we
   add #
of partitions or # of brokers, will the same partitioner start
   distributing
the messages to different partitions?
 And if it does, how can that same consumer continue to read off the
messages of those ids if it was interrupted in the middle?
   
The num.partitions config in server.properties is used only for
 topics
   that
are auto created (controlled by auto.create.topics.enable). For
 topics
   that
you create using the admin tool, you can specify the number of
  partitions
that you want. After that, currently there is no way to change that.
  For
that reason, it is a good idea to over partition your topic, which
 also
helps load balance partitions onto the brokers. You are right that if
  you
change the number of partitions later, then previously messages that
   stuck
to a certain partition would now get routed to a different partition,
   which
is undesirable for applications that want to use sticky partitioning.
   
- I'd like to create a consumer per partition, and for each one to
subscribe to the changes of that one. How can this be done in kafka?
   
For your use case, it seems like SimpleConsumer might be a better
 fit.
However, it will require you to write code to handle discovery of
  leader
for the partition that your consumer is consuming. Chris has written
  up a
great example that you can follow -
   
   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
   
Thanks,
Neha
   
   
On Wed, May 22, 2013 at 12:37 PM, Chris Curtin 
 curtin.ch...@gmail.com
wrote:
   
 Hi Tim,


 On Wed, May 22, 2013 at 3:25 PM, Timothy Chen tnac...@gmail.com
   wrote:

  Hi,
 
  I'm currently trying to understand how Kafka (0.8) can scale with
  our
 usage
  pattern and how to setup the partitioning.
 
  We want to route the same messages belonging to the same id to
 the
   same
  queue, so its consumer will able to consume all the messages of
  that
id.
 
  My questions:
 
   - From my understanding, in Kafka we would need to have a custom
  partitioner that routes the same messages to the same partition
   right?
  I'm
  trying to find examples of writing this partitioner logic, but I
   can't
 find
  any. Can someone point me to an example?
 
 
   
  https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

 The partitioner here does a simple mod on the IP address and the #
 of
 partitions. You'd need to define your own logic, but this is a
 start.


  - I see that Kafka server.properties allows one to specify the
  number
of
  partitions it supports. However, when we want to scale I wonder
 if
  we
 add #
  of partitions or # of brokers, will the same partitioner start
 distributing
  the messages to different partitions?
   And if it does, how can that same consumer continue to read off
  the
  messages of those ids if it was interrupted in the middle?
 

 I'll let someone else answer this.


 
  - I'd like to create a consumer per partition, and for each one
 to
  subscribe to the changes of that one. How can this be done in
  kafka?
 

 Two ways: Simple Consumer or Consumer Groups:

 Depends on the level of control you want on code processing a
  specific
 

Partitioning and scale

2013-05-22 Thread Timothy Chen
Hi,

I'm currently trying to understand how Kafka (0.8) can scale with our usage
pattern and how to setup the partitioning.

We want to route the same messages belonging to the same id to the same
queue, so its consumer will able to consume all the messages of that id.

My questions:

 - From my understanding, in Kafka we would need to have a custom
partitioner that routes the same messages to the same partition right?  I'm
trying to find examples of writing this partitioner logic, but I can't find
any. Can someone point me to an example?

- I see that Kafka server.properties allows one to specify the number of
partitions it supports. However, when we want to scale I wonder if we add #
of partitions or # of brokers, will the same partitioner start distributing
the messages to different partitions?
 And if it does, how can that same consumer continue to read off the
messages of those ids if it was interrupted in the middle?

- I'd like to create a consumer per partition, and for each one to
subscribe to the changes of that one. How can this be done in kafka?

Thanks,

Tim


Re: Partitioning and scale

2013-05-22 Thread Chris Curtin
Hi Tim,


On Wed, May 22, 2013 at 3:25 PM, Timothy Chen tnac...@gmail.com wrote:

 Hi,

 I'm currently trying to understand how Kafka (0.8) can scale with our usage
 pattern and how to setup the partitioning.

 We want to route the same messages belonging to the same id to the same
 queue, so its consumer will able to consume all the messages of that id.

 My questions:

  - From my understanding, in Kafka we would need to have a custom
 partitioner that routes the same messages to the same partition right?  I'm
 trying to find examples of writing this partitioner logic, but I can't find
 any. Can someone point me to an example?

 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

The partitioner here does a simple mod on the IP address and the # of
partitions. You'd need to define your own logic, but this is a start.


 - I see that Kafka server.properties allows one to specify the number of
 partitions it supports. However, when we want to scale I wonder if we add #
 of partitions or # of brokers, will the same partitioner start distributing
 the messages to different partitions?
  And if it does, how can that same consumer continue to read off the
 messages of those ids if it was interrupted in the middle?


I'll let someone else answer this.



 - I'd like to create a consumer per partition, and for each one to
 subscribe to the changes of that one. How can this be done in kafka?


Two ways: Simple Consumer or Consumer Groups:

Depends on the level of control you want on code processing a specific
partition vs. getting one assigned to it (and level of control over offset
management).

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Examplehttps://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example



 Thanks,

 Tim



Re: Partitioning and scale

2013-05-22 Thread Neha Narkhede
- I see that Kafka server.properties allows one to specify the number of
partitions it supports. However, when we want to scale I wonder if we add #
of partitions or # of brokers, will the same partitioner start distributing
the messages to different partitions?
 And if it does, how can that same consumer continue to read off the
messages of those ids if it was interrupted in the middle?

The num.partitions config in server.properties is used only for topics that
are auto created (controlled by auto.create.topics.enable). For topics that
you create using the admin tool, you can specify the number of partitions
that you want. After that, currently there is no way to change that. For
that reason, it is a good idea to over partition your topic, which also
helps load balance partitions onto the brokers. You are right that if you
change the number of partitions later, then previously messages that stuck
to a certain partition would now get routed to a different partition, which
is undesirable for applications that want to use sticky partitioning.

- I'd like to create a consumer per partition, and for each one to
subscribe to the changes of that one. How can this be done in kafka?

For your use case, it seems like SimpleConsumer might be a better fit.
However, it will require you to write code to handle discovery of leader
for the partition that your consumer is consuming. Chris has written up a
great example that you can follow -
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

Thanks,
Neha


On Wed, May 22, 2013 at 12:37 PM, Chris Curtin curtin.ch...@gmail.comwrote:

 Hi Tim,


 On Wed, May 22, 2013 at 3:25 PM, Timothy Chen tnac...@gmail.com wrote:

  Hi,
 
  I'm currently trying to understand how Kafka (0.8) can scale with our
 usage
  pattern and how to setup the partitioning.
 
  We want to route the same messages belonging to the same id to the same
  queue, so its consumer will able to consume all the messages of that id.
 
  My questions:
 
   - From my understanding, in Kafka we would need to have a custom
  partitioner that routes the same messages to the same partition right?
  I'm
  trying to find examples of writing this partitioner logic, but I can't
 find
  any. Can someone point me to an example?
 
  https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

 The partitioner here does a simple mod on the IP address and the # of
 partitions. You'd need to define your own logic, but this is a start.


  - I see that Kafka server.properties allows one to specify the number of
  partitions it supports. However, when we want to scale I wonder if we
 add #
  of partitions or # of brokers, will the same partitioner start
 distributing
  the messages to different partitions?
   And if it does, how can that same consumer continue to read off the
  messages of those ids if it was interrupted in the middle?
 

 I'll let someone else answer this.


 
  - I'd like to create a consumer per partition, and for each one to
  subscribe to the changes of that one. How can this be done in kafka?
 

 Two ways: Simple Consumer or Consumer Groups:

 Depends on the level of control you want on code processing a specific
 partition vs. getting one assigned to it (and level of control over offset
 management).

 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example


 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example


 
  Thanks,
 
  Tim
 



Re: Partitioning and scale

2013-05-22 Thread Timothy Chen
Hi Neha/Chris,

Thanks for the reply, so if I set a fixed number of partitions and just add
brokers to the broker pool, does it rebalance the load to the new brokers
(along with the data)?

Tim


On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede neha.narkh...@gmail.comwrote:

 - I see that Kafka server.properties allows one to specify the number of
 partitions it supports. However, when we want to scale I wonder if we add #
 of partitions or # of brokers, will the same partitioner start distributing
 the messages to different partitions?
  And if it does, how can that same consumer continue to read off the
 messages of those ids if it was interrupted in the middle?

 The num.partitions config in server.properties is used only for topics that
 are auto created (controlled by auto.create.topics.enable). For topics that
 you create using the admin tool, you can specify the number of partitions
 that you want. After that, currently there is no way to change that. For
 that reason, it is a good idea to over partition your topic, which also
 helps load balance partitions onto the brokers. You are right that if you
 change the number of partitions later, then previously messages that stuck
 to a certain partition would now get routed to a different partition, which
 is undesirable for applications that want to use sticky partitioning.

 - I'd like to create a consumer per partition, and for each one to
 subscribe to the changes of that one. How can this be done in kafka?

 For your use case, it seems like SimpleConsumer might be a better fit.
 However, it will require you to write code to handle discovery of leader
 for the partition that your consumer is consuming. Chris has written up a
 great example that you can follow -

 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

 Thanks,
 Neha


 On Wed, May 22, 2013 at 12:37 PM, Chris Curtin curtin.ch...@gmail.com
 wrote:

  Hi Tim,
 
 
  On Wed, May 22, 2013 at 3:25 PM, Timothy Chen tnac...@gmail.com wrote:
 
   Hi,
  
   I'm currently trying to understand how Kafka (0.8) can scale with our
  usage
   pattern and how to setup the partitioning.
  
   We want to route the same messages belonging to the same id to the same
   queue, so its consumer will able to consume all the messages of that
 id.
  
   My questions:
  
- From my understanding, in Kafka we would need to have a custom
   partitioner that routes the same messages to the same partition right?
   I'm
   trying to find examples of writing this partitioner logic, but I can't
  find
   any. Can someone point me to an example?
  
  
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
 
  The partitioner here does a simple mod on the IP address and the # of
  partitions. You'd need to define your own logic, but this is a start.
 
 
   - I see that Kafka server.properties allows one to specify the number
 of
   partitions it supports. However, when we want to scale I wonder if we
  add #
   of partitions or # of brokers, will the same partitioner start
  distributing
   the messages to different partitions?
And if it does, how can that same consumer continue to read off the
   messages of those ids if it was interrupted in the middle?
  
 
  I'll let someone else answer this.
 
 
  
   - I'd like to create a consumer per partition, and for each one to
   subscribe to the changes of that one. How can this be done in kafka?
  
 
  Two ways: Simple Consumer or Consumer Groups:
 
  Depends on the level of control you want on code processing a specific
  partition vs. getting one assigned to it (and level of control over
 offset
  management).
 
  https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
 
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
  
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
 
 
  
   Thanks,
  
   Tim
  
 



Re: Partitioning and scale

2013-05-22 Thread Neha Narkhede
Not automatically as of today. You have to run the reassign-partitions tool
and explicitly move selected partitions to the new brokers. If you use this
tool, you can move partitions to the new broker without any downtime.

Thanks,
Neha


On Wed, May 22, 2013 at 2:20 PM, Timothy Chen tnac...@gmail.com wrote:

 Hi Neha/Chris,

 Thanks for the reply, so if I set a fixed number of partitions and just add
 brokers to the broker pool, does it rebalance the load to the new brokers
 (along with the data)?

 Tim


 On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  - I see that Kafka server.properties allows one to specify the number of
  partitions it supports. However, when we want to scale I wonder if we
 add #
  of partitions or # of brokers, will the same partitioner start
 distributing
  the messages to different partitions?
   And if it does, how can that same consumer continue to read off the
  messages of those ids if it was interrupted in the middle?
 
  The num.partitions config in server.properties is used only for topics
 that
  are auto created (controlled by auto.create.topics.enable). For topics
 that
  you create using the admin tool, you can specify the number of partitions
  that you want. After that, currently there is no way to change that. For
  that reason, it is a good idea to over partition your topic, which also
  helps load balance partitions onto the brokers. You are right that if you
  change the number of partitions later, then previously messages that
 stuck
  to a certain partition would now get routed to a different partition,
 which
  is undesirable for applications that want to use sticky partitioning.
 
  - I'd like to create a consumer per partition, and for each one to
  subscribe to the changes of that one. How can this be done in kafka?
 
  For your use case, it seems like SimpleConsumer might be a better fit.
  However, it will require you to write code to handle discovery of leader
  for the partition that your consumer is consuming. Chris has written up a
  great example that you can follow -
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
 
  Thanks,
  Neha
 
 
  On Wed, May 22, 2013 at 12:37 PM, Chris Curtin curtin.ch...@gmail.com
  wrote:
 
   Hi Tim,
  
  
   On Wed, May 22, 2013 at 3:25 PM, Timothy Chen tnac...@gmail.com
 wrote:
  
Hi,
   
I'm currently trying to understand how Kafka (0.8) can scale with our
   usage
pattern and how to setup the partitioning.
   
We want to route the same messages belonging to the same id to the
 same
queue, so its consumer will able to consume all the messages of that
  id.
   
My questions:
   
 - From my understanding, in Kafka we would need to have a custom
partitioner that routes the same messages to the same partition
 right?
I'm
trying to find examples of writing this partitioner logic, but I
 can't
   find
any. Can someone point me to an example?
   
   
  https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
  
   The partitioner here does a simple mod on the IP address and the # of
   partitions. You'd need to define your own logic, but this is a start.
  
  
- I see that Kafka server.properties allows one to specify the number
  of
partitions it supports. However, when we want to scale I wonder if we
   add #
of partitions or # of brokers, will the same partitioner start
   distributing
the messages to different partitions?
 And if it does, how can that same consumer continue to read off the
messages of those ids if it was interrupted in the middle?
   
  
   I'll let someone else answer this.
  
  
   
- I'd like to create a consumer per partition, and for each one to
subscribe to the changes of that one. How can this be done in kafka?
   
  
   Two ways: Simple Consumer or Consumer Groups:
  
   Depends on the level of control you want on code processing a specific
   partition vs. getting one assigned to it (and level of control over
  offset
   management).
  
  
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
  
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
   
  https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
 
  
  
   
Thanks,
   
Tim