[ 
https://issues.apache.org/jira/browse/KAFKA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169381#comment-16169381
 ] 

ASF GitHub Bot commented on KAFKA-5896:
---------------------------------------

GitHub user 56quarters reopened a pull request:

    https://github.com/apache/kafka/pull/3876

    KAFKA-5896: Force Connect tasks to stop via thread interruption

    Interrupt the thread of Kafka Connect tasks that do not stop within
    the timeout via `Worker::stopAndAwaitTasks()`. Previously tasks would
    be asked to stop via setting a `stopping` flag. It was possible for
    tasks to ignore this flag if they were, for example, waiting for
    a lock or blocked on I/O.
    
    This prevents issues where tasks may end up with multiple threads
    all running and attempting to make progress when there should only
    be a single thread running for that task at a time.
    
    Fixes KAFKA-5896
    
    /cc @rhauch @tedyu 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/smarter-travel-media/kafka force-task-stop

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3876
    
----
commit 31c879c1a1f0bd4f5999c021baca8e99e733ffe1
Author: Nick Pillitteri <ni...@smartertravelmedia.com>
Date:   2017-09-13T14:54:40Z

    Force Connect tasks to stop via thread interruption after a timeout
    
    Interrupt the thread of Kafka Connect tasks that do not stop within
    the timeout via Worker::stopAndAwaitTasks(). Previously tasks would
    be asked to stop via setting a `stopping` flag. It was possible for
    tasks to ignore this flag if they were, for example, waiting for
    a lock or blocked on I/O.
    
    This prevents issues where tasks may end up with multiple threads
    all running and attempting to make progress when there should only
    be a single thread running for that task at a time.
    
    Fixes KAFKA-5896

commit ed3ef9c1f139cae4f09eefe1f66edc1c58c5ace6
Author: Nick Pillitteri <ni...@smartertravelmedia.com>
Date:   2017-09-15T19:54:15Z

    Rename per CR

commit afea834c708eae2b7d3dedbdeafed83197eaed94
Author: Nick Pillitteri <n...@tshlabs.org>
Date:   2017-09-16T04:52:57Z

    Rename per CR

----


> Kafka Connect task threads never interrupted
> --------------------------------------------
>
>                 Key: KAFKA-5896
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5896
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: Nick Pillitteri
>            Priority: Minor
>
> h2. Problem
> Kafka Connect tasks associated with connectors are run in their own threads. 
> When tasks are stopped or restarted, a flag is set - {{stopping}} - to 
> indicate the task should stop processing records. However, if the thread the 
> task is running in is blocked (waiting for a lock or performing I/O) it's 
> possible the task will never stop.
> I've created a connector specifically to demonstrate this issue (along with 
> some more detailed instructions for reproducing the issue): 
> https://github.com/smarter-travel-media/hang-connector
> I believe this is an issue because it means that a single badly behaved 
> connector (any connector that does I/O without timeouts) can cause the Kafka 
> Connect worker to get into a state where the only solution is to restart the 
> JVM.
> I think, but couldn't reproduce, that this is the cause of this problem on 
> Stack Overflow: 
> https://stackoverflow.com/questions/43802156/inconsistent-connector-state-connectexception-task-already-exists-in-this-work
> h2. Expected Result
> I would expect the Worker to eventually interrupt the thread that the task is 
> running in. In the past across various other libraries, this is what I've 
> seen done when a thread needs to be forcibly stopped.
> h2. Actual Result
> In actuality, the Worker sets a {{stopping}} flag and lets the thread run 
> indefinitely. It uses a timeout while waiting for the task to stop but after 
> this timeout has expired it simply sets a {{cancelled}} flag. This means that 
> every time a task is restarted, a new thread running the task will be 
> created. Thus a task may end up with multiple instances all running in their 
> own threads when there's only supposed to be a single thread.
> h2. Steps to Reproduce
> The problem can be replicated by using the connector available here: 
> https://github.com/smarter-travel-media/hang-connector
> Apologies for how involved the steps are.
> I've created a patch that forcibly interrupts threads after they fail to 
> gracefully shutdown here: 
> https://github.com/smarter-travel-media/kafka/commit/295c747a9fd82ee8b30556c89c31e0bfcce5a2c5
> I've confirmed that this fixes the issue. I can add some unit tests and 
> submit a PR if people agree that this is a bug and interrupting threads is 
> the right fix.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to