[
https://issues.apache.org/jira/browse/HDDS-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828634#comment-17828634
]
Hongbing Wang commented on HDDS-10177:
--------------------------------------
I just post some logs in our cluster related this ticket, so far no other
impact have been found.
{noformat}
2024-03-20 01:51:29,944 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.RDBSnapshotProvider: Ratis snapshot transfer is
complete.
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: Installing checkpoint with
OMTransactionInfo 4#6100006845
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service
KeyDeletingService
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service
DirectoryDeletingService
2024-03-20 01:51:37,539 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service
OpenKeyCleanupService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service
SstFilteringService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service
SnapshotDeletingService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service
MultipartUploadCleanupService
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
OzoneManagerStateMachine is pausing
2024-03-20 01:51:37,540 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Stopping
OMDoubleBuffer flush thread
2024-03-20 01:51:37,541 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ipc.Server: Stopping server on 9862
2024-03-20 01:51:37,549 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: RPC server is stopped. Spend 9 ms.
2024-03-20 01:51:37,550 [om3-InstallSnapshotThread] INFO
org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Shutting down
CompactionDagPruningService.
2024-03-20 01:51:39,070 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: metadataManager is stopped. Spend 1520
ms.
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: Replaced DB with checkpoint from OM:
om2, term: 4, index: 6100006845, time: 58 ms
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down
executorService: 'SnapDiffExecutor'
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down
executorService: 'SstDumpToolExecutor'
2024-03-20 01:51:39,128 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service
SnapshotDiffCleanupService
2024-03-20 01:51:39,131 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.helpers.OmKeyInfo: OmKeyInfo.getCodec ignorePipeline
= true
2024-03-20 01:51:39,136 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.db.DBStoreBuilder: Using RocksDB DBOptions from
om.db.ini file
2024-03-20 01:51:40,822 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.db.RocksDatabase:
ozone.om.skip.error.close.rocksdb value is: true.
2024-03-20 01:51:40,853 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: S3 Multi-Tenancy is disabled
2024-03-20 01:51:40,854 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OmSnapshotManager: Ozone filesystem snapshot feature
is enabled.
2024-03-20 01:51:40,855 [om3-InstallSnapshotThread] WARN
org.apache.hadoop.hdds.server.ServerUtils: ozone.om.snapshot.diff.db.dir is not
configured. We recommend adding this setting. Falling
back to ozone.metadata.dirs instead.
2024-03-20 01:51:40,864 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.hdds.utils.NativeLibraryLoader: Loading Library:
ozone_rocksdb_tools
2024-03-20 01:51:40,865 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down
executorService: 'SstDumpToolExecutor'
2024-03-20 01:51:40,867 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.TrashPolicyOzone: Ozone Manager trash configuration:
Deletion interval = 10080 minutes, Emptier interval = 1440 minutes.
2024-03-20 01:51:40,869 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine:
OzoneManagerStateMachine is un-pausing
2024-03-20 01:51:40,869 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: Reloaded OM state with Term: 4 and
Index: 6100006845. Spend 1740 ms
2024-03-20 01:51:40,869 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: Creating RPC Server
2024-03-20 01:51:41,206 [om3-InstallSnapshotThread] INFO
org.reflections.Reflections: Reflections took 335 ms to scan 8 urls, producing
23 keys and 661 values [using 96 cores]
2024-03-20 01:51:41,210 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class
java.util.concurrent.LinkedBlockingQueue, queueCapacity: 20000, scheduler:
class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2024-03-20 01:51:41,210 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ipc.Server: Listener at localhost:9862
2024-03-20 01:51:41,226 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: RPC server is re-started. Spend 356 ms.
2024-03-20 01:51:55,727 [om3-InstallSnapshotThread] INFO
org.apache.hadoop.ozone.om.OzoneManager: Install Checkpoint is finished with
Term: 4 and Index: 6100006845. Spend 18189 ms.
2024-03-20 01:51:55,727 [om3-InstallSnapshotThread] INFO
org.apache.ratis.server.impl.SnapshotInstallationHandler:
om3@group-197E298202B9: StateMachine successfully installed snapshot index 6
100006845. Reloading the StateMachine.
{noformat}
> OM RPC server restarted by InstallSnapshotThread during shutdown
> ----------------------------------------------------------------
>
> Key: HDDS-10177
> URL: https://issues.apache.org/jira/browse/HDDS-10177
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Manager
> Reporter: Attila Doroszlai
> Assignee: Sammi Chen
> Priority: Major
> Attachments: 2024-01-20T18-36-42_926-jvmRun1.dump,
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices-output.txt,
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices.txt
>
>
> TestSnapshotBackgroundServices was successful:
> {code}
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 171.3 s -- in
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices
> {code}
> but it timed out during post-test cluster shutdown, because it was waiting
> indefinitely for the RPC server to stop:
> {code}
> "main"
> java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at org.apache.hadoop.ipc.Server.join(Server.java:3569)
> at
> org.apache.hadoop.ozone.om.OzoneManager.join(OzoneManager.java:2286)
> at
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopOM(MiniOzoneClusterImpl.java:558)
> at
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.stop(MiniOzoneHAClusterImpl.java:311)
> at
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:453)
> at
> org.apache.hadoop.ozone.om.TestSnapshotBackgroundServices.shutdown(TestSnapshotBackgroundServices.java:202)
> {code}
> The problem is that {{InstallSnapshotThread}} restarted the RPC server in the
> meantime:
> {code}
> 2024-01-20 18:37:17,649 [main] INFO ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:stop(310)) - Stopping the OzoneManager omNode-3
> 2024-01-20 18:37:17,649 [main] INFO om.OzoneManager
> (OzoneManager.java:stop(2204)) - omNode-3[localhost:15012]: Stopping Ozone
> Manager
> 2024-01-20 18:37:17,650 [main] INFO ipc.Server (Server.java:stop(3523)) -
> Stopping server on 15012
> ...
> 2024-01-20 18:37:17,913 [omNode-3-InstallSnapshotThread] INFO ipc.Server
> (Server.java:<init>(1287)) - Listener at localhost:15012
> 2024-01-20 18:37:17,932 [omNode-3-InstallSnapshotThread] INFO
> om.OzoneManager (OzoneManager.java:installCheckpoint(3863)) - RPC server is
> re-started. Spend 377 ms.
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]