[jira] [Commented] (KAFKA-4848) Stream thread getting into deadlock state while trying to get rocksdb lock in retryWithBackoff

Sachin Mittal (JIRA) Mon, 03 Apr 2017 09:45:55 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953777#comment-15953777
 ]


Sachin Mittal commented on KAFKA-4848:
--------------------------------------

Please let me know if this will be done in 0.10.2 branch. Do I need to issue a 
PR for the same.

Also note that in that branch some fixes which are there in trunk like catching 
the commit failed exception of offset commits is not there, which would be  a 
pre-requiste for this fix.

So let me know how are we planning on 0.10.2.1 release.




> Stream thread getting into deadlock state while trying to get rocksdb lock in 
> retryWithBackoff
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4848
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4848
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.0
>            Reporter: Sachin Mittal
>            Assignee: Sachin Mittal
>             Fix For: 0.11.0.0, 0.10.2.1
>
>         Attachments: thr-1
>
>
> We see a deadlock state when streams thread to process a task takes longer 
> than MAX_POLL_INTERVAL_MS_CONFIG time. In this case this threads partitions 
> are assigned to some other thread including rocksdb lock. When it tries to 
> process the next task it cannot get rocks db lock and simply keeps waiting 
> for that lock forever.
> in retryWithBackoff for AbstractTaskCreator we have a backoffTimeMs = 50L.
> If it does not get lock the we simply increase the time by 10x and keep 
> trying inside the while true loop.
> We need to have a upper bound for this backoffTimeM. If the time is greater 
> than  MAX_POLL_INTERVAL_MS_CONFIG and it still hasn't got the lock means this 
> thread's partitions are moved somewhere else and it may not get the lock 
> again.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-4848) Stream thread getting into deadlock state while trying to get rocksdb lock in retryWithBackoff

Reply via email to