[ https://issues.apache.org/jira/browse/KAFKA-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527018#comment-15527018 ]
Michael Coon commented on KAFKA-4224: ------------------------------------- As I said, line 278 of NetworkClient (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java) is where the stack trace is being logged. It's catching java.lang.Exception, so anything other than a Throwble would be caught and just logged. I have no visibility into a problem in my consumer.poll() call. I just get back empty records (or in the case of a consumer that has multiple partitions assigned, I would have one partition never making progress). If instead of catching java.lang.Exception, if the Fetcher callback being called on line 278 could throw a more detailed exception, and NetworkClient could pass that up to my app code somehow, that's all I would need to know something is wrong with a particular partition at a particular offset. I understand why the code punts and just logs the problem--because at this point it doesn't know which partitions are failing for the body of the ClientResponse. That doesn't get flushed out until further down in the callback code. That's why something further down would have to pass back some kind of partial status or something...indicating which partitions failed, for what reason, and what offset had issues. That would be ideal. But again, I get why they just log it at this point...there is no mechanism to have partial results. As it stands, if my consumer is managing multiple partitions, it has no way of knowing there is a problem with a particular partition unless it looks over the progress of all partitions after each poll call. That's the part that's highly inefficient. > IndexOutOfBounds in RecordsIterator causes infinite loop in NetworkClient > ------------------------------------------------------------------------- > > Key: KAFKA-4224 > URL: https://issues.apache.org/jira/browse/KAFKA-4224 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.10.0.1 > Reporter: Michael Coon > > For whatever reason, I seem to have a corrupted message that is returned from > a broker that puts the consumer into an infinite loop. The > org.apache.kafka.client.consumer.internals.Fetcher (line 590) is getting the > next record from the RecordsIterator or MemoryRecords but when it attempts to > decode the record, it throws "IndexOutOfBounds" exception. Unfortunately, > that exception is merely logged and the Fetcher goes on to get the next > message. But the exception apparently does not move the underlying buffer > read forward in such a way that it would actually go and get the next record. > The result: it keeps trying to read the corrupted record but can't make > progress. > I offer two potential solutions: > 1) throw the exception up to me and let me figure out whether I want to skip > forward in offsets > 2) Make sure the underlying RecordsIterator actually moves forward on > exceptions so that progress can be made when corrupted messages are found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)