daicheng created KUDU-3537: ------------------------------ Summary: Could not remove renamed recovery dir(nfs) when kudu restarts Key: KUDU-3537 URL: https://issues.apache.org/jira/browse/KUDU-3537 Project: Kudu Issue Type: Bug Affects Versions: 1.16.0 Environment: kudu on k8s Reporter: daicheng
Configured kudu directories to NFS on k8s , and insert some data to kudu,after restart kudu, the kudu tserver fails to bootstrap with error like : {code:java} IO error: Could not remove renamed recovery dir /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: One or more errors occurred {code} while the issue didn't comes when the directory on local disk. here some error details: {code:java} Config source | Replicas | Current term | Config index | Committed? ---------------+------------------------+--------------+--------------+------------ master | A* B | | | Yes A | [config not available] | | | B | [config not available] | | | Tablet 1bb9b2f91c3f48d7a97fb974112dedd6 of table 'impala::test.test_kudu' is unavailable: 2 replica(s) not RUNNING 1bf087d776394884b2031385cd7e8b82 (kudu-tserver-0.kudu-tservers.qilu-local.svc.cluster.local:7050): not running State: FAILED Data state: TABLET_DATA_READY Last status: IO error: Could not remove renamed recovery dir /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703663028897150: /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703663028897150: One or more errors occurred ea0e0a381c284877aa234228ed81a24f (kudu-tserver-1.kudu-tservers.qilu-local.svc.cluster.local:7050): not running [LEADER] State: FAILED Data state: TABLET_DATA_READY Last status: IO error: Could not remove renamed recovery dir /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: One or more errors occurred{code} {code:java} W1227 07:43:15.222187 74 env_posix.cc:2337] Could not delete directory: IO error: /var/lib/kudu/tserver/wals/3b734a27abc74768ad6cff599b66f0f1.recovery-1703662995205917: Directory not empty (error 39)Wed, Dec 27 2023 3:43:15 pmW1227 07:43:15.222219 74 env_posix.cc:2063] Error running callback with file /var/lib/kudu/tserver/wals/3b734a27abc74768ad6cff599b66f0f1.recovery-1703662995205917 during walk: IO error: /var/lib/kudu/tserver/wals/3b734a27abc74768ad6cff599b66f0f1.recovery-1703662995205917: Directory not empty (error 39)Wed, Dec 27 2023 3:43:15 pmE1227 07:43:15.261075 74 ts_tablet_manager.cc:1378] T 3b734a27abc74768ad6cff599b66f0f1 P ea0e0a381c284877aa234228ed81a24f: Tablet failed to bootstrap: IO error: Could not remove renamed recovery dir /var/lib/kudu/tserver/wals/3b734a27abc74768ad6cff599b66f0f1.recovery-1703662995205917: /var/lib/kudu/tserver/wals/3b734a27abc74768ad6cff599b66f0f1.recovery-1703662995205917: One or more errors occurredWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.261124 74 ts_tablet_manager.cc:1356] T 3b734a27abc74768ad6cff599b66f0f1 P ea0e0a381c284877aa234228ed81a24f: Time spent bootstrapping tablet: real 0.213s user 0.070s sys 0.035sWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.261147 74 tablet_replica.cc:323] stopping tablet replicaWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.261160 74 raft_consensus.cc:2227] T 3b734a27abc74768ad6cff599b66f0f1 P ea0e0a381c284877aa234228ed81a24f [term 1 FOLLOWER]: Raft consensus shutting down.Wed, Dec 27 2023 3:43:15 pmI1227 07:43:15.261169 74 raft_consensus.cc:2256] T 3b734a27abc74768ad6cff599b66f0f1 P ea0e0a381c284877aa234228ed81a24f [term 1 FOLLOWER]: Raft consensus is shut down!Wed, Dec 27 2023 3:43:15 pmI1227 07:43:15.261204 74 tablet_bootstrap.cc:492] T 1bb9b2f91c3f48d7a97fb974112dedd6 P ea0e0a381c284877aa234228ed81a24f: Bootstrap starting.Wed, Dec 27 2023 3:43:15 pmI1227 07:43:15.452575 74 tablet_bootstrap.cc:492] T 1bb9b2f91c3f48d7a97fb974112dedd6 P ea0e0a381c284877aa234228ed81a24f: Bootstrap replayed 1/1 log segments. Stats: ops{read=4406 overwritten=0 applied=4406 ignored=2} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0} orphaned_commits=0. Pending: 0 replicatesWed, Dec 27 2023 3:43:15 pmW1227 07:43:15.469259 74 env_posix.cc:2337] Could not delete directory: IO error: /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: Directory not empty (error 39)Wed, Dec 27 2023 3:43:15 pmW1227 07:43:15.469303 74 env_posix.cc:2063] Error running callback with file /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637 during walk: IO error: /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: Directory not empty (error 39)Wed, Dec 27 2023 3:43:15 pmE1227 07:43:15.504146 74 ts_tablet_manager.cc:1378] T 1bb9b2f91c3f48d7a97fb974112dedd6 P ea0e0a381c284877aa234228ed81a24f: Tablet failed to bootstrap: IO error: Could not remove renamed recovery dir /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: /var/lib/kudu/tserver/wals/1bb9b2f91c3f48d7a97fb974112dedd6.recovery-1703662995452637: One or more errors occurredWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.504194 74 ts_tablet_manager.cc:1356] T 1bb9b2f91c3f48d7a97fb974112dedd6 P ea0e0a381c284877aa234228ed81a24f: Time spent bootstrapping tablet: real 0.243s user 0.062s sys 0.046sWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.504212 74 tablet_replica.cc:323] stopping tablet replicaWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.504217 74 raft_consensus.cc:2227] T 1bb9b2f91c3f48d7a97fb974112dedd6 P ea0e0a381c284877aa234228ed81a24f [term 1 FOLLOWER]: Raft consensus shutting down.Wed, Dec 27 2023 3:43:15 pmI1227 07:43:15.504230 74 raft_consensus.cc:2256] T 1bb9b2f91c3f48d7a97fb974112dedd6 P ea0e0a381c284877aa234228ed81a24f [term 1 FOLLOWER]: Raft consensus is shut down!Wed, Dec 27 2023 3:43:15 pmI1227 07:43:15.504251 74 tablet_bootstrap.cc:492] T d7eff00a19c44c728b4d46505c1ac5f2 P ea0e0a381c284877aa234228ed81a24f: Bootstrap starting.Wed, Dec 27 2023 3:43:15 pmI1227 07:43:15.669176 74 tablet_bootstrap.cc:492] T d7eff00a19c44c728b4d46505c1ac5f2 P ea0e0a381c284877aa234228ed81a24f: Bootstrap replayed 1/1 log segments. Stats: ops{read=4975 overwritten=0 applied=4975 ignored=0} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0} orphaned_commits=0. Pending: 0 replicatesWed, Dec 27 2023 3:43:15 pmW1227 07:43:15.687026 74 env_posix.cc:2337] Could not delete directory: IO error: /var/lib/kudu/tserver/wals/d7eff00a19c44c728b4d46505c1ac5f2.recovery-1703662995669230: Directory not empty (error 39)Wed, Dec 27 2023 3:43:15 pmW1227 07:43:15.687069 74 env_posix.cc:2063] Error running callback with file /var/lib/kudu/tserver/wals/d7eff00a19c44c728b4d46505c1ac5f2.recovery-1703662995669230 during walk: IO error: /var/lib/kudu/tserver/wals/d7eff00a19c44c728b4d46505c1ac5f2.recovery-1703662995669230: Directory not empty (error 39)Wed, Dec 27 2023 3:43:15 pmE1227 07:43:15.722580 74 ts_tablet_manager.cc:1378] T d7eff00a19c44c728b4d46505c1ac5f2 P ea0e0a381c284877aa234228ed81a24f: Tablet failed to bootstrap: IO error: Could not remove renamed recovery dir /var/lib/kudu/tserver/wals/d7eff00a19c44c728b4d46505c1ac5f2.recovery-1703662995669230: /var/lib/kudu/tserver/wals/d7eff00a19c44c728b4d46505c1ac5f2.recovery-1703662995669230: One or more errors occurredWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.722630 74 ts_tablet_manager.cc:1356] T d7eff00a19c44c728b4d46505c1ac5f2 P ea0e0a381c284877aa234228ed81a24f: Time spent bootstrapping tablet: real 0.218s user 0.073s sys 0.048sWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.722642 74 tablet_replica.cc:323] stopping tablet replicaWed, Dec 27 2023 3:43:15 pmI1227 07:43:15.722648 74 raft_consensus.cc:2227] T d7eff00a19c44c728b4d46505c1ac5f2 P ea0e0a381c284877aa234228ed81a24f [term 2 FOLLOWER]: Raft consensus shutting down.Wed, Dec 27 2023 3:43:15 pmI1227 07:43:15.722656 74 raft_consensus.cc:2256] T d7eff00a19c44c728b4d46505c1ac5f2 P ea0e0a381c284877aa234228ed81a24f [term 2 FOLLOWER]: Raft consensus is shut down! {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)