[ 
https://issues.apache.org/jira/browse/KUDU-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631825#comment-17631825
 ] 

ASF subversion and git services commented on KUDU-3419:
-------------------------------------------------------

Commit 4a8931636eb79d2a3fdee478d7b9e3bc890758b7 in kudu's branch 
refs/heads/master from xinghuayu007
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=4a8931636 ]

[KUDU-3419] Fix the stuck of starting tablet server

If the permission of tablet metadata file is denied when
tablet server starting,tablet server will get stuck and
can not exit automatically. That is because if the permission
is denied, the object of TabletServer will be deconstructed,
it will deconstruct its thread pool:txn_status_manager_pool_,
but this thread pool contains a task: TxnStalenessTrackerTask,
the task is still running. There are no code to shutdown the
task in that case. Therefore tablet server will get stuck.

See #KUDU-3419 for detail

Change-Id: I8c9a4f4158fcb0a36499345e00ee72c65f5fefe0
Reviewed-on: http://gerrit.cloudera.org:8080/19203
Reviewed-by: Yingchun Lai <acelyc1112...@gmail.com>
Reviewed-by: Alexey Serbin <ale...@apache.org>
Tested-by: Alexey Serbin <ale...@apache.org>


> Tablet server maybe get stuck when loading tablet metadata failed
> -----------------------------------------------------------------
>
>                 Key: KUDU-3419
>                 URL: https://issues.apache.org/jira/browse/KUDU-3419
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Xixu Wang
>            Priority: Major
>         Attachments: image-2022-11-04-14-57-49-684.png, 
> image-2022-11-04-14-59-54-665.png, image-2022-11-04-15-25-05-437.png, 
> image-2022-11-04-15-29-27-092.png, image-2022-11-04-15-30-08-892.png, 
> image-2022-11-04-15-32-34-366.png
>
>
> Tablet server maybe get stuck when loading tablet metadata failed.
> The follow steps repeat the bug.
> 1. Change the permission of one tablet meta file to root. We use account: 
> *kudu* to run Kudu.
> !image-2022-11-04-14-57-49-684.png!
> 2.Start an instance of tablet server. A permission erro will be saw:
> !image-2022-11-04-15-29-27-092.png!
> 3. Tablet server gets stuck and will not exit automatically.
> !image-2022-11-04-15-30-08-892.png!
> 4. Pstack is as follow:
> As we can see. Tablet Server can not exit, because ThreadPool can not be 
> shutdown. TxnStatlessTrasckerTask is running, which cause threadpool can not 
> be shutdown.
> !image-2022-11-04-15-32-34-366.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to