[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery

2016-11-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710240#comment-15710240
 ] 

Sönke Liebau commented on KAFKA-3834:
-

Does anybody know whether this is being discussed somewhere else that I perhaps 
couldn't find? In other words, are there any news on this? :)

I'm currently running into this behavior when using commitSync - in this case, 
when the broker dies, the writing thread will simply block indefinitely and not 
do anything, until the broker comes back, which as stated above is fine if it 
is a transient failure, but I think a timeout makes sense here. Same issue, 
when starting a consumer with a mistyped ip address, that will never error out 
- granted, you'll notice that you are not sending any messages at some point, 
but I think this could be improved.

> Consumer should not block in poll on coordinator discovery
> --
>
> Key: KAFKA-3834
> URL: https://issues.apache.org/jira/browse/KAFKA-3834
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Currently we block indefinitely in poll() when discovering the coordinator 
> for the group. Instead, we can return an empty record set when the passed 
> timeout expires. The downside is that it may obscure the underlying problem 
> (which is usually misconfiguration), but users typically have to look at the 
> logs to figure out the problem anyway. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery

2016-06-19 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339050#comment-15339050
 ] 

Ewen Cheslack-Postava commented on KAFKA-3834:
--

Is returning an empty record set even what we want to do? Why not just throw an 
exception that indicates the underlying error? (Which has the benefit of making 
applications aware of the real issue and hopefully alleviating you of having to 
answer questions about why they are getting errors)?

> Consumer should not block in poll on coordinator discovery
> --
>
> Key: KAFKA-3834
> URL: https://issues.apache.org/jira/browse/KAFKA-3834
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Currently we block indefinitely in poll() when discovering the coordinator 
> for the group. Instead, we can return an empty record set when the passed 
> timeout expires. The downside is that it may obscure the underlying problem 
> (which is usually misconfiguration), but users typically have to look at the 
> logs to figure out the problem anyway. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery

2016-06-23 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347486#comment-15347486
 ] 

Jason Gustafson commented on KAFKA-3834:


[~ewencp] Depends on the nature of the problem, I guess. We usually treat 
coordinator unavailability as a transient problem which will eventually 
recover. For example, when an offsets partition is being migrated to another 
broker, there is a short window where we won't be able to determine which 
broker is the coordinator. In these cases, I think it makes more sense to 
retry. However, if we cannot communicate with any broker (that is, we can't 
fetch topic metadata), then raising an exception may be preferable. This latter 
case has been discussed before and I'm not attempting a solution here. My 
suggestion is to keep the current behavior, but not block poll.

> Consumer should not block in poll on coordinator discovery
> --
>
> Key: KAFKA-3834
> URL: https://issues.apache.org/jira/browse/KAFKA-3834
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Currently we block indefinitely in poll() when discovering the coordinator 
> for the group. Instead, we can return an empty record set when the passed 
> timeout expires. The downside is that it may obscure the underlying problem 
> (which is usually misconfiguration), but users typically have to look at the 
> logs to figure out the problem anyway. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3834) Consumer should not block in poll on coordinator discovery

2016-06-26 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350268#comment-15350268
 ] 

Peter Davis commented on KAFKA-3834:


I believe we have seen this issue IRL when the new coordinator takes a long 
time to become available after an election.  This can happen if log compaction 
has halted (for example due to too-small I/O buffer), then __consumer_offsets 
will grow ridiculously large; in one instance it was taking the coordinators 
several minutes to come online before we realized the problem.  Meanwhile, 
poll() would spin and log red-herring errors every 100ms. 

This also occurs on commitSync(), which I believe does a poll() internally, but 
also has a "while" loop of its own.  Should improving blocking of commitSync() 
be a separate JIRA?

> Consumer should not block in poll on coordinator discovery
> --
>
> Key: KAFKA-3834
> URL: https://issues.apache.org/jira/browse/KAFKA-3834
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> Currently we block indefinitely in poll() when discovering the coordinator 
> for the group. Instead, we can return an empty record set when the passed 
> timeout expires. The downside is that it may obscure the underlying problem 
> (which is usually misconfiguration), but users typically have to look at the 
> logs to figure out the problem anyway. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)