[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611104#comment-14611104 ] Guozhang Wang commented on KAFKA-2168: -- Committed the follow-up patch to trunk. Closing. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch, KAFKA-2168_2015-06-23_09:39:07.patch, KAFKA-2168_2015-06-30_10:54:22.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608743#comment-14608743 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch, KAFKA-2168_2015-06-23_09:39:07.patch, KAFKA-2168_2015-06-30_10:54:22.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608753#comment-14608753 ] Jason Gustafson commented on KAFKA-2168: [~ewencp], there were some minor issues from the code reviews that I've tried to address in the recent patch. Once these are accepted, we had better stick a fork in it. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch, KAFKA-2168_2015-06-23_09:39:07.patch, KAFKA-2168_2015-06-30_10:54:22.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607548#comment-14607548 ] Ewen Cheslack-Postava commented on KAFKA-2168: -- This version was committed to trunk. Were we expecting any follow up patches in this JIRA or should we close this? New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch, KAFKA-2168_2015-06-23_09:39:07.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597914#comment-14597914 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch, KAFKA-2168_2015-06-23_09:39:07.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596850#comment-14596850 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593579#comment-14593579 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, KAFKA-2168_2015-06-19_09:19:02.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592582#comment-14592582 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Sub-task Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582524#comment-14582524 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582528#comment-14582528 ] Jay Kreps commented on KAFKA-2168: -- Hey [~guozhang], have you had a chance to look at this? It would be good to get your thoughts as it relates somewhat to the refactoring you did... New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, KAFKA-2168_2015-06-11_14:09:59.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575032#comment-14575032 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575088#comment-14575088 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573627#comment-14573627 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571944#comment-14571944 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572083#comment-14572083 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, KAFKA-2168_2015-06-03_21:06:42.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570036#comment-14570036 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, KAFKA-2168_2015-06-02_17:09:37.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568198#comment-14568198 ] Jason Gustafson commented on KAFKA-2168: Updated reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568212#comment-14568212 ] Jason Gustafson commented on KAFKA-2168: The most recent patch attempts to follow [~guozhang]'s overall advice above. Most of the calls are still blocking, but I have moved the blocking code out of Coordinator/Fetcher and into KafkaConsumer. This makes it possible to use wakeup() from the consumer without splitting the logic across multiple classes. The consumer is no longer synchronized, which makes it unsafe for multi-threaded access, but wakeup() can be safely used from other threads. This should also resolve KAFKA-2230. Note also that this patch will likely have to be updated if KAFKA-2123 is accepted. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563832#comment-14563832 ] Jason Gustafson commented on KAFKA-2168: Created reviewboard https://reviews.apache.org/r/34789/diff/ against branch upstream/trunk New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563843#comment-14563843 ] Jason Gustafson commented on KAFKA-2168: I've added a patch which removes synchronization and allows a prompt close (using the wakeup call on the underlying selector). It does not expose the wakeup call in the consumer interface, however, since that seems to be a bit trickier. I think we may want to move that to a separate ticket. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563753#comment-14563753 ] Guozhang Wang commented on KAFKA-2168: -- I would also prefer to stick with single-threaded consumer usage, and I agree that KAFKA-2123 would be important to have then. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563968#comment-14563968 ] Jason Gustafson commented on KAFKA-2168: This patch is a work in progress. Using the wakeup() method in order to invoke close() on NetworkClient does not work as easily as I thought it would due to the completeAll() methods which invoke poll() in a loop. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson Attachments: KAFKA-2168.patch The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563212#comment-14563212 ] Jason Gustafson commented on KAFKA-2168: Talked with Neha, Ewen, and Jay last night. Consensus was to remove the synchronization of KafkaConsumer and provide a wakeup() method which can be used to interrupt a long poll. This should solve the issue from this ticket, though it may hinge on KAFKA-1894, which removes polling loops from the current consumer. Note that this explicitly makes the consumer unsafe for multi-threaded access, though we will provide a thread-safe close() method which can be called (for example) from a shutdown hook. I'm going to update this ticket to reflect this change and submit a patch. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562061#comment-14562061 ] Neha Narkhede commented on KAFKA-2168: -- There are tradeoffs to having multiple threads per consumer instance vs having a consumer instance per thread. The consumer code is simpler in the latter design, the throughput is better but the # of TCP connections are fewer in the former design. Some of the concerns [~ewencp] brings up above can be mitigated if there is a separate consumer instance per user thread and others can be mitigated by the user picking the right timeout on poll() that they are comfortable blocking on. All of this would mean explicitly stating that the consumer APIs are not threadsafe and that the user should create multiple consumer instances across threads instead of sharing one. We still need to make sure close() can be called from a separate thread as [~ewencp] correctly points out, though the change isn't large if we go down this route. It seems like it is simpler to stick to the original intention of the design and not share consumer instances across threads? New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559650#comment-14559650 ] Ewen Cheslack-Postava commented on KAFKA-2168: -- Some reasons you might want to use the consumer from multiple threads: 1. I don't think it's necessarily addressed by this JIRA, but if processing messages is expensive, or the processing code is easier to write as synchronous calls even if it requires accessing some network resource, you might want multiple threads to be able to call poll(). This should already behave correctly. 2. Manage offset commits in a separate thread from polling. If you need to coordinate some other action with offset commit, your choices are currently to be careful in computing timeouts for poll() in order to get processing in the same thread or to try committing from another thread. The code for doing this is much simpler to write if you can just fire up a thread that does sync commit + whatever other operation you need to do, then sleeps for the next commit interval. If you do this right you can continue processing messages during the offset commit, even if it ends up delayed for some reason. 3. close() is probably the most obvious case given the feedback we've had on the producer's close() method blocking indefinitely -- you want to be able to close() from a separate thread if you keep a thread dedicated to poll()ing. For example, using a shutdown hook requires this. The feedback on the producer made it clear this is important and should also have a timeout. 4. Metrics. MetricsReporter is the right way to get metrics, but that only works if what you care about is already covered. I don't think per topic-partition position() and committed() are currently reported -- not sure what the plan is there since reporting metrics in something like mirrormaker might be too much, but some applications will want to be able to track that info in metrics. This is another case where just firing up a thread to periodically check the state of the consumer and report it via whatever metrics package they use is probably the easiest implementation. 5. Any time you may need to make dynamic changes to the consumer in response to external events. For example, consider a mirrormaker-like service. If you want to be able to dynamically reconfigure the consumer to add new topics to the job, subscribe() will block indefinitely if poll() has a long timeout and new data isn't flowing in to the topics you're already subscribed to. A wakeup() method isn't good enough here since you need to manage the subsequent race between the thread trying to subscribe() and the poll()ing thread. At a minimum, the current state where thread safety is guaranteed in the javadoc but we have indefinite blocking is a problem. If we want it to be a single-threaded API, then we should just leave locking up to the user (although we'd probably still at least want some sort of wakeup() method so they could interrupt long poll() calls). New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554724#comment-14554724 ] Jason Gustafson commented on KAFKA-2168: I feel a little wary about finer-grained synchronization given all the state in the consumer, the network client, and the selector. I actually think the two-lock approach is the least intrusive since it only touches the KafkaConsumer and preserves the current coarse synchronization design, but I agree that it's unusual. Here's an idea of what it might look like in the code: {code:java} lock.queue(); try { client.wakeup(); lock.lock(); // critical section } finally { lock.unlock() } {code} Definitely weird, but not that hard to understand. You'd still run into the same problem if multiple threads are trying to poll, but that seems like unintended usage anyway. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554743#comment-14554743 ] Jason Gustafson commented on KAFKA-2168: Note that the coordinator has a couple cases where poll is called in a loop. There's a separate issue to fix this: KAFKA-1894. Might want to hold off on this until that is resolved. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554867#comment-14554867 ] Jason Gustafson commented on KAFKA-2168: Yes, that is the tradeoff, but at least it would be confined to the KafkaConsumer class. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554799#comment-14554799 ] Ewen Cheslack-Postava commented on KAFKA-2168: -- But isn't that going to make a mess of all the methods in Kafka consumer since we need to do this everywhere we currently synchronize? And I don't see a good way of providing it as a generic utility since you need the body of the method within the finally block. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553332#comment-14553332 ] Ewen Cheslack-Postava commented on KAFKA-2168: -- For option 1 it's probably worth pointing out that we already have some finer grained synchronization (metrics and metadata since those are shared by many other components, and the producer doesn't have synchronization at the level of KafkaProducer, only on its internals). So we're already double locking in a lot of cases. My concern with option 2 is that it's a pretty unusual approach which makes the code harder to understand. Scanning through the code, there aren't that many places in KafkaConsumer where multiple components are used together in a way that would require synchronization. updateFetchPositions and refreshCommitttedOffsets might since they use subscriptions + fetcher and subscriptions + coordinator together, respectively. Especially with SubscriptionState we'd need to be careful since some of the calls to that return an internal collection flags, and the subsequent operation might need all that processing to be synchronized to be sure not to miss anything. For example, during partition reassignment, which checks a flag, does reassignment, and then resets the flag; we'd need to make sure that a subscription during that time wouldn't get lost. The other case is poll(). I thought this might be hard to reason about if some state was changing while it was executing, but I think it's not a problem as long as a few of the steps can be synchronized, in particular partition reassignment and offset commit. By the way, I mapped out the dependencies. It's sort of in 4 layers with subscriptions + metadata at the bottom, NetworkClient above that using only metadata, then all three are used by both coordinator and fetcher in the next layer, and then the top layer is KafkaConsumer. But KafkaConsumer touches all of them, so kind of breaks any layering. Some of the things in KafkaConsumer that require synchronization still could possibly move into a component in a lower level (possibly something new) if we move some of the code around. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553343#comment-14553343 ] Ewen Cheslack-Postava commented on KAFKA-2168: -- Actually, now I realize another solution is to only remove synchronization from the one place it's a problem -- things that might call NetworkClient.poll() with long timeouts. Could we use synchronized(this) around everything *except* the NetworkClient.poll() calls, and then have anything using NetworkClient synchronize on it? This is finer grained locking still, but I think could end up having pretty minimal impact on the current code. The drawback is that since NetworkClient is used by all the classes, the requirement of locking gets spread across all of them. New consumer poll() can block other calls like position(), commit(), and close() indefinitely - Key: KAFKA-2168 URL: https://issues.apache.org/jira/browse/KAFKA-2168 Project: Kafka Issue Type: Bug Components: clients, consumer Reporter: Ewen Cheslack-Postava Assignee: Jason Gustafson The new consumer is currently using very coarse-grained synchronization. For most methods this isn't a problem since they finish quickly once the lock is acquired, but poll() might run for a long time (and commonly will since polling with long timeouts is a normal use case). This means any operations invoked from another thread may block until the poll() call completes. Some example use cases where this can be a problem: * A shutdown hook is registered to trigger shutdown and invokes close(). It gets invoked from another thread and blocks indefinitely. * User wants to manage offset commit themselves in a background thread. If the commit policy is not purely time based, it's not currently possibly to make sure the call to commit() will be processed promptly. Two possible solutions to this: 1. Make sure a lock is not held during the actual select call. Since we have multiple layers (KafkaConsumer - NetworkClient - Selector - nio Selector) this is probably hard to make work cleanly since locking is currently only performed at the KafkaConsumer level and we'd want it unlocked around a single line of code in Selector. 2. Wake up the selector before synchronizing for certain operations. This would require some additional coordination to make sure the caller of wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() thread being woken up and then promptly reacquiring the lock with a subsequent long poll() call). -- This message was sent by Atlassian JIRA (v6.3.4#6332)