one consumerConnector or many?

2013-05-29 Thread Rob Withers
In thinking about the design of consumption, we have in mind a generic
consumer server which would consume from more than one message type.  The
handling of each type of message would be different.  I suppose we could
have upwards of say 50 different message types, eventually, maybe 100+
different types.  Which of the following designs would be best and why would
the other options be bad?

 

1)  Have all message types go through one topic and use a dispatcher
pattern to select the correct handler.  Use one consumerConnector.

2)  Use a different topic for each message type, but still use one
consumerConnector and a dispatcher pattern.

3)  Use a different topic for each message type and have a separate
consumerConnector for each topic.

 

I am struggling with whether my assumptions are correct.  It seems that a
single connector for a topic would establish one socket to each broker, as
rebalancing assigns various partitions to that thread.  Option 2 would pull
messages from more than one topic through a single socket to a particular
broker, is it so?  Would option 3 be reasonable, establishing upwards of 100
sockets per broker?  

 

I am guestimating that option 2 is the right way forward, to bound socket
use, and we'll need to figure out a way to parameterize stream consumption
with the right handlers for a particular msg type.  If we add a topic, do
you think we should create a new connector or restart the original connector
with the new topic in the map?

 

Thanks,

rob



Re: one consumerConnector or many?

2013-05-29 Thread Chris Curtin
I'd look at a variation of #2. Can your messages by grouped into a 'class
(for lack of a better term)' that are consumed together? For example a
'class' of 'auditing events' or 'sensor events'. The idea would to then
have a topic for 'class'.

A couple of benefits to this:
- you can define your consumption of a 'class's resources by value. So the
'audit' topic may only get a 2 threaded consumer while the 'sensor' class
gets a 10 threaded consumer.
- you can stop processing a 'class' of messages if you need to without
taking all the consumers off line (Assuming you have different processors
or a way while running to alter your number of threads per topic.)

Since it sounds like you may be frequently adding new message types this
approach also allows you to decide if you want to shutdown only a part of
your processing to add the new code to handle the message.

Finally, why the concern about socket use? A well configured Windows or
Linux machine can have thousands of open sockets without problems. Since
0.8.0 only connects to the Broker with the topic/partition you end up with
1 socket per topic/partition and consumer.

Hope this helps,

Chris


On Wed, May 29, 2013 at 9:13 AM, Rob Withers reefed...@gmail.com wrote:

 In thinking about the design of consumption, we have in mind a generic
 consumer server which would consume from more than one message type.  The
 handling of each type of message would be different.  I suppose we could
 have upwards of say 50 different message types, eventually, maybe 100+
 different types.  Which of the following designs would be best and why
 would
 the other options be bad?



 1)  Have all message types go through one topic and use a dispatcher
 pattern to select the correct handler.  Use one consumerConnector.

 2)  Use a different topic for each message type, but still use one
 consumerConnector and a dispatcher pattern.

 3)  Use a different topic for each message type and have a separate
 consumerConnector for each topic.



 I am struggling with whether my assumptions are correct.  It seems that a
 single connector for a topic would establish one socket to each broker, as
 rebalancing assigns various partitions to that thread.  Option 2 would pull
 messages from more than one topic through a single socket to a particular
 broker, is it so?  Would option 3 be reasonable, establishing upwards of
 100
 sockets per broker?



 I am guestimating that option 2 is the right way forward, to bound socket
 use, and we'll need to figure out a way to parameterize stream consumption
 with the right handlers for a particular msg type.  If we add a topic, do
 you think we should create a new connector or restart the original
 connector
 with the new topic in the map?



 Thanks,

 rob




RE: one consumerConnector or many?

2013-05-29 Thread Withers, Robert
Thanks for the info.  Are you saying that even with a single connector, with 
say 3 topics and 3 threads per topic and 3 brokers with 3 partitions for all 3 
topics on all 3 brokers, that a consumer box would have 9 sockets open?  What 
if there are 6 partitions per topic, would that be 18 open sockets?

I have read somewhere that a high partition number, per topic, is desirable, to 
scale out the consumers and to be prepared to dynamically scale out consumption 
during a traffic spike.  Is it so?  100 topics, with 16 brokers and 200 
partitions per topic with 1 consumer connector (just hypothetically so) would 
be 1600 sockets or 2 sockets?

For sure these boxes have plenty of ports.  I am just thinking through possible 
failures and what flexibility we have in configuration of producers/consumers 
to topics.  Really the question is best practices in this area.  A producer 
server handling 100+ msg types could also connect quite a bit.  So, perhaps it 
is best to restrict producer and consumer servers to process a restricted 
class of types.  Certainly if the producer is also hosting a web server, but 
perhaps not as dire on the consumer side.

thanks,
rob  

From: Chris Curtin [curtin.ch...@gmail.com]
Sent: Wednesday, May 29, 2013 7:36 AM
To: users
Subject: Re: one consumerConnector or many?

I'd look at a variation of #2. Can your messages by grouped into a 'class
(for lack of a better term)' that are consumed together? For example a
'class' of 'auditing events' or 'sensor events'. The idea would to then
have a topic for 'class'.

A couple of benefits to this:
- you can define your consumption of a 'class's resources by value. So the
'audit' topic may only get a 2 threaded consumer while the 'sensor' class
gets a 10 threaded consumer.
- you can stop processing a 'class' of messages if you need to without
taking all the consumers off line (Assuming you have different processors
or a way while running to alter your number of threads per topic.)

Since it sounds like you may be frequently adding new message types this
approach also allows you to decide if you want to shutdown only a part of
your processing to add the new code to handle the message.

Finally, why the concern about socket use? A well configured Windows or
Linux machine can have thousands of open sockets without problems. Since
0.8.0 only connects to the Broker with the topic/partition you end up with
1 socket per topic/partition and consumer.

Hope this helps,

Chris


On Wed, May 29, 2013 at 9:13 AM, Rob Withers reefed...@gmail.com wrote:

 In thinking about the design of consumption, we have in mind a generic
 consumer server which would consume from more than one message type.  The
 handling of each type of message would be different.  I suppose we could
 have upwards of say 50 different message types, eventually, maybe 100+
 different types.  Which of the following designs would be best and why
 would
 the other options be bad?



 1)  Have all message types go through one topic and use a dispatcher
 pattern to select the correct handler.  Use one consumerConnector.

 2)  Use a different topic for each message type, but still use one
 consumerConnector and a dispatcher pattern.

 3)  Use a different topic for each message type and have a separate
 consumerConnector for each topic.



 I am struggling with whether my assumptions are correct.  It seems that a
 single connector for a topic would establish one socket to each broker, as
 rebalancing assigns various partitions to that thread.  Option 2 would pull
 messages from more than one topic through a single socket to a particular
 broker, is it so?  Would option 3 be reasonable, establishing upwards of
 100
 sockets per broker?



 I am guestimating that option 2 is the right way forward, to bound socket
 use, and we'll need to figure out a way to parameterize stream consumption
 with the right handlers for a particular msg type.  If we add a topic, do
 you think we should create a new connector or restart the original
 connector
 with the new topic in the map?



 Thanks,

 rob



Re: one consumerConnector or many?

2013-05-29 Thread Chris Curtin
That's a good question about # of sockets when a single consumer is
connecting. I'll let someone from LinkedIn comment if each consumer has a
socket per topic/partition or if it is per Broker, since I'm not familiar
with that part of the code.

On Wed, May 29, 2013 at 9:53 AM, Withers, Robert robert.with...@dish.comwrote:

 Thanks for the info.  Are you saying that even with a single connector,
 with say 3 topics and 3 threads per topic and 3 brokers with 3 partitions
 for all 3 topics on all 3 brokers, that a consumer box would have 9 sockets
 open?  What if there are 6 partitions per topic, would that be 18 open
 sockets?

 I have read somewhere that a high partition number, per topic, is
 desirable, to scale out the consumers and to be prepared to dynamically
 scale out consumption during a traffic spike.  Is it so?  100 topics, with
 16 brokers and 200 partitions per topic with 1 consumer connector (just
 hypothetically so) would be 1600 sockets or 2 sockets?

 For sure these boxes have plenty of ports.  I am just thinking through
 possible failures and what flexibility we have in configuration of
 producers/consumers to topics.  Really the question is best practices in
 this area.  A producer server handling 100+ msg types could also connect
 quite a bit.  So, perhaps it is best to restrict producer and consumer
 servers to process a restricted class of types.  Certainly if the
 producer is also hosting a web server, but perhaps not as dire on the
 consumer side.

 thanks,
 rob
 
 From: Chris Curtin [curtin.ch...@gmail.com]
 Sent: Wednesday, May 29, 2013 7:36 AM
 To: users
 Subject: Re: one consumerConnector or many?

 I'd look at a variation of #2. Can your messages by grouped into a 'class
 (for lack of a better term)' that are consumed together? For example a
 'class' of 'auditing events' or 'sensor events'. The idea would to then
 have a topic for 'class'.

 A couple of benefits to this:
 - you can define your consumption of a 'class's resources by value. So the
 'audit' topic may only get a 2 threaded consumer while the 'sensor' class
 gets a 10 threaded consumer.
 - you can stop processing a 'class' of messages if you need to without
 taking all the consumers off line (Assuming you have different processors
 or a way while running to alter your number of threads per topic.)

 Since it sounds like you may be frequently adding new message types this
 approach also allows you to decide if you want to shutdown only a part of
 your processing to add the new code to handle the message.

 Finally, why the concern about socket use? A well configured Windows or
 Linux machine can have thousands of open sockets without problems. Since
 0.8.0 only connects to the Broker with the topic/partition you end up with
 1 socket per topic/partition and consumer.

 Hope this helps,

 Chris


 On Wed, May 29, 2013 at 9:13 AM, Rob Withers reefed...@gmail.com wrote:

  In thinking about the design of consumption, we have in mind a generic
  consumer server which would consume from more than one message type.  The
  handling of each type of message would be different.  I suppose we could
  have upwards of say 50 different message types, eventually, maybe 100+
  different types.  Which of the following designs would be best and why
  would
  the other options be bad?
 
 
 
  1)  Have all message types go through one topic and use a dispatcher
  pattern to select the correct handler.  Use one consumerConnector.
 
  2)  Use a different topic for each message type, but still use one
  consumerConnector and a dispatcher pattern.
 
  3)  Use a different topic for each message type and have a separate
  consumerConnector for each topic.
 
 
 
  I am struggling with whether my assumptions are correct.  It seems that a
  single connector for a topic would establish one socket to each broker,
 as
  rebalancing assigns various partitions to that thread.  Option 2 would
 pull
  messages from more than one topic through a single socket to a particular
  broker, is it so?  Would option 3 be reasonable, establishing upwards of
  100
  sockets per broker?
 
 
 
  I am guestimating that option 2 is the right way forward, to bound socket
  use, and we'll need to figure out a way to parameterize stream
 consumption
  with the right handlers for a particular msg type.  If we add a topic, do
  you think we should create a new connector or restart the original
  connector
  with the new topic in the map?
 
 
 
  Thanks,
 
  rob
 
 



Re: one consumerConnector or many?

2013-05-29 Thread Jun Rao
Rob,

You are correct that each instance of consumer will use a single socket to
connect to a broker, independent of # topics/partitions. One thing that's
good to avoid is to read all data and filter in the consumer, especially
when the data is consumed multiple times by different consumers. In this
case, it's better to put the filtered data in a separate topic and let all
consumers consume the filtered data directly.

Thanks,

Jun




On Wed, May 29, 2013 at 6:13 AM, Rob Withers reefed...@gmail.com wrote:

 In thinking about the design of consumption, we have in mind a generic
 consumer server which would consume from more than one message type.  The
 handling of each type of message would be different.  I suppose we could
 have upwards of say 50 different message types, eventually, maybe 100+
 different types.  Which of the following designs would be best and why
 would
 the other options be bad?



 1)  Have all message types go through one topic and use a dispatcher
 pattern to select the correct handler.  Use one consumerConnector.

 2)  Use a different topic for each message type, but still use one
 consumerConnector and a dispatcher pattern.

 3)  Use a different topic for each message type and have a separate
 consumerConnector for each topic.



 I am struggling with whether my assumptions are correct.  It seems that a
 single connector for a topic would establish one socket to each broker, as
 rebalancing assigns various partitions to that thread.  Option 2 would pull
 messages from more than one topic through a single socket to a particular
 broker, is it so?  Would option 3 be reasonable, establishing upwards of
 100
 sockets per broker?



 I am guestimating that option 2 is the right way forward, to bound socket
 use, and we'll need to figure out a way to parameterize stream consumption
 with the right handlers for a particular msg type.  If we add a topic, do
 you think we should create a new connector or restart the original
 connector
 with the new topic in the map?



 Thanks,

 rob




RE: one consumerConnector or many?

2013-05-29 Thread Withers, Robert
Thanks, Jun.  We have considered doing message filtering in the consumer.  
However, the thrust of my question below is not filtering, but dispatching.  If 
we take Chris' recommendation and pump a small set of msg types, belonging to 
the same class of messages, such as Account History, through the same topic, 
we will want to process all the messages, but we will want to process each msg 
type within the class differently, so we will want to dispatch to different 
handlers.

I totally see your point that if we only want to process a subset of the 
messages, then we really ought to filter in the producer and send the filtered 
message stream to its own topic.

I am leaning toward the architecture of having a different consumerConnector 
per topic, as there ARE plenty of ports.  This allows per topic control, which 
is useful.  Do you see any issues with this approach?

Thanks,
rob 


-Original Message-
From: Jun Rao [mailto:jun...@gmail.com] 
Sent: Wednesday, May 29, 2013 9:58 AM
To: users@kafka.apache.org
Subject: Re: one consumerConnector or many?

Rob,

You are correct that each instance of consumer will use a single socket to 
connect to a broker, independent of # topics/partitions. One thing that's good 
to avoid is to read all data and filter in the consumer, especially when the 
data is consumed multiple times by different consumers. In this case, it's 
better to put the filtered data in a separate topic and let all consumers 
consume the filtered data directly.

Thanks,

Jun




On Wed, May 29, 2013 at 6:13 AM, Rob Withers reefed...@gmail.com wrote:

 In thinking about the design of consumption, we have in mind a generic 
 consumer server which would consume from more than one message type.  
 The handling of each type of message would be different.  I suppose we 
 could have upwards of say 50 different message types, eventually, 
 maybe 100+ different types.  Which of the following designs would be 
 best and why would the other options be bad?



 1)  Have all message types go through one topic and use a dispatcher
 pattern to select the correct handler.  Use one consumerConnector.

 2)  Use a different topic for each message type, but still use one
 consumerConnector and a dispatcher pattern.

 3)  Use a different topic for each message type and have a separate
 consumerConnector for each topic.



 I am struggling with whether my assumptions are correct.  It seems 
 that a single connector for a topic would establish one socket to each 
 broker, as rebalancing assigns various partitions to that thread.  
 Option 2 would pull messages from more than one topic through a single 
 socket to a particular broker, is it so?  Would option 3 be 
 reasonable, establishing upwards of
 100
 sockets per broker?



 I am guestimating that option 2 is the right way forward, to bound 
 socket use, and we'll need to figure out a way to parameterize stream 
 consumption with the right handlers for a particular msg type.  If we 
 add a topic, do you think we should create a new connector or restart 
 the original connector with the new topic in the map?



 Thanks,

 rob