[ 
https://issues.apache.org/jira/browse/KAFKA-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527018#comment-15527018
 ] 

Michael Coon commented on KAFKA-4224:
-------------------------------------

As I said, line 278 of NetworkClient 
(https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java)
  is where the stack trace is being logged. It's catching java.lang.Exception, 
so anything other than a Throwble would be caught and just logged. I have no 
visibility into a problem in my consumer.poll() call. I just get back empty 
records (or in the case of a consumer that has multiple partitions assigned, I 
would have one partition never making progress). If instead of catching 
java.lang.Exception, if the Fetcher callback being called on line 278 could 
throw a more detailed exception, and NetworkClient could pass that up to my app 
code somehow, that's all I would need to know something is wrong with a 
particular partition at a particular offset. I understand why the code punts 
and just logs the problem--because at this point it doesn't know which 
partitions are failing for the body of the ClientResponse. That doesn't get 
flushed out until further down in the callback code. That's why something 
further down would have to pass back some kind of partial status or 
something...indicating which partitions failed, for what reason, and what 
offset had issues. That would be ideal. But again, I get why they just log it 
at this point...there is no mechanism to have partial results.

As it stands, if my consumer is managing multiple partitions, it has no way of 
knowing there is a problem with a particular partition unless it looks over the 
progress of all partitions after each poll call. That's the part that's highly 
inefficient.
 

> IndexOutOfBounds in RecordsIterator causes infinite loop in NetworkClient
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-4224
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4224
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.0.1
>            Reporter: Michael Coon
>
> For whatever reason, I seem to have a corrupted message that is returned from 
> a broker that puts the consumer into an infinite loop. The 
> org.apache.kafka.client.consumer.internals.Fetcher (line 590) is getting the 
> next record from the RecordsIterator or MemoryRecords but when it attempts to 
> decode the record, it throws "IndexOutOfBounds" exception. Unfortunately, 
> that exception is merely logged and the Fetcher goes on to get the next 
> message. But the exception apparently does not move the underlying buffer 
> read forward in such a way that it would actually go and get the next record. 
> The result: it keeps trying to read the corrupted record but can't make 
> progress. 
> I offer two potential solutions: 
> 1) throw the exception up to me and let me figure out whether I want to skip 
> forward in offsets
> 2) Make sure the underlying RecordsIterator actually moves forward on 
> exceptions so that progress can be made when corrupted messages are found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to