[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery
[ https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710240#comment-15710240 ] Sönke Liebau commented on KAFKA-3834: - Does anybody know whether this is being discussed somewhere else that I perhaps couldn't find? In other words, are there any news on this? :) I'm currently running into this behavior when using commitSync - in this case, when the broker dies, the writing thread will simply block indefinitely and not do anything, until the broker comes back, which as stated above is fine if it is a transient failure, but I think a timeout makes sense here. Same issue, when starting a consumer with a mistyped ip address, that will never error out - granted, you'll notice that you are not sending any messages at some point, but I think this could be improved. > Consumer should not block in poll on coordinator discovery > -- > > Key: KAFKA-3834 > URL: https://issues.apache.org/jira/browse/KAFKA-3834 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > Currently we block indefinitely in poll() when discovering the coordinator > for the group. Instead, we can return an empty record set when the passed > timeout expires. The downside is that it may obscure the underlying problem > (which is usually misconfiguration), but users typically have to look at the > logs to figure out the problem anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery
[ https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339050#comment-15339050 ] Ewen Cheslack-Postava commented on KAFKA-3834: -- Is returning an empty record set even what we want to do? Why not just throw an exception that indicates the underlying error? (Which has the benefit of making applications aware of the real issue and hopefully alleviating you of having to answer questions about why they are getting errors)? > Consumer should not block in poll on coordinator discovery > -- > > Key: KAFKA-3834 > URL: https://issues.apache.org/jira/browse/KAFKA-3834 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > Currently we block indefinitely in poll() when discovering the coordinator > for the group. Instead, we can return an empty record set when the passed > timeout expires. The downside is that it may obscure the underlying problem > (which is usually misconfiguration), but users typically have to look at the > logs to figure out the problem anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery
[ https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347486#comment-15347486 ] Jason Gustafson commented on KAFKA-3834: [~ewencp] Depends on the nature of the problem, I guess. We usually treat coordinator unavailability as a transient problem which will eventually recover. For example, when an offsets partition is being migrated to another broker, there is a short window where we won't be able to determine which broker is the coordinator. In these cases, I think it makes more sense to retry. However, if we cannot communicate with any broker (that is, we can't fetch topic metadata), then raising an exception may be preferable. This latter case has been discussed before and I'm not attempting a solution here. My suggestion is to keep the current behavior, but not block poll. > Consumer should not block in poll on coordinator discovery > -- > > Key: KAFKA-3834 > URL: https://issues.apache.org/jira/browse/KAFKA-3834 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > Currently we block indefinitely in poll() when discovering the coordinator > for the group. Instead, we can return an empty record set when the passed > timeout expires. The downside is that it may obscure the underlying problem > (which is usually misconfiguration), but users typically have to look at the > logs to figure out the problem anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery
[ https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350268#comment-15350268 ] Peter Davis commented on KAFKA-3834: I believe we have seen this issue IRL when the new coordinator takes a long time to become available after an election. This can happen if log compaction has halted (for example due to too-small I/O buffer), then __consumer_offsets will grow ridiculously large; in one instance it was taking the coordinators several minutes to come online before we realized the problem. Meanwhile, poll() would spin and log red-herring errors every 100ms. This also occurs on commitSync(), which I believe does a poll() internally, but also has a "while" loop of its own. Should improving blocking of commitSync() be a separate JIRA? > Consumer should not block in poll on coordinator discovery > -- > > Key: KAFKA-3834 > URL: https://issues.apache.org/jira/browse/KAFKA-3834 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > Currently we block indefinitely in poll() when discovering the coordinator > for the group. Instead, we can return an empty record set when the passed > timeout expires. The downside is that it may obscure the underlying problem > (which is usually misconfiguration), but users typically have to look at the > logs to figure out the problem anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)