[ 
https://issues.apache.org/jira/browse/KAFKA-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151826#comment-13151826
 ] 

Taylor Gautier commented on KAFKA-208:
--------------------------------------

Hi Jay.  Good question.

KAFKA-48 in its current description would allow a long poll for one topic at a 
time.  If a client is interested in only a few topics, it is conceivable that 
the client could use one TCP connection per topic, and issue a blocking request 
for each topic it is interested in, one topic per TCP request.  But this 
strategy rapidly becomes undesirable as the number of topics one is interested 
increases.

As we discussed in the comments for KAFKA-48 it's possible to consider the use 
case for multi-fetch and long polling together.  However there wasn't a 
conclusion to that discussion in terms of a direction.  The proposal that we 
discussed would allow for an interest set to be requested, and a long poll to 
then take place over al of the topics.  This is a reasonable solution to the 
problem, but it's not really going to address my needs - and thus why I 
submitted this feature request.

The reasoning is simple - if I have 10,000 topics I am interested in, and only 
a handful of those topics get messages, then KAFKA-48 is not going to address 
my signal to noise problem.  I will still have to submit - rather poll - for 
10,000 topics and then get a few messages at a time - this means I have 
something like - depending on the numbers - 10kx20 bytes (assuming average 
topic name length of 20) 200kbyte requests which maybe return a handful of 
messages - lets say 10 at at time averaging say 200 bytes which means 200kbytes 
to receive 2kbytes or a 200:1 noise to signal ratio.  Ack.

KAFKA-170 only addresses the issue of non-blocking consumption in the consumer 
which is an implementation issue in the client only.  I know that KAFKA-170 can 
be done in the client using the current TCP protocol because I've already done 
it using a NodeJS client.


                
> Efficient polling mechanism for many topics
> -------------------------------------------
>
>                 Key: KAFKA-208
>                 URL: https://issues.apache.org/jira/browse/KAFKA-208
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Taylor Gautier
>
> Currently, the way to poll many topics is to submit a request for each one in 
> turn, and read the responses.  Especially if there are few messages delivered 
> on many topics, the network overhead to implement this scheme can far 
> outweigh the bandwidth of the actual messages delivered.
> Effectively, for topics A, B, C the request/response scheme is the following:
> -> Request A offset a
> -> Request B offset b
> -> Request C offset c
> <- no messages
> <- 1 message offset b
> <- no messages
> -> Request A offset a
> -> Request B offset b'
> -> Request C offset c
> <- no messages
> <- no messages
> <- no messages
> etc.
> I propose a more efficient mechanism which works a bit like epoll in that the 
> client can register interest in a particular topic.  There are many options 
> for the implementation, but the one I suggest goes like so:
> -> Register interest A offset a
> -> Register interest B offset b
> -> Register interest C offset c
> -> Next message (null)
> <- messages for B (1 message)
> -> Next message (topic B offset b')
> <- no messages
> -> Unregister Interest C
> ...
> It is possible to implement the "Next Message" request as either blocking or 
> non-blocking.  I suggest that the request format include space for the 
> timeout, which if set to 0 will be a nonblocking response, and if set to 
> anything other than 0, will block for at most timeout ms. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to