This is an automated email from the ASF dual-hosted git repository.

weichiu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ozone-site.git


The following commit(s) were added to refs/heads/master by this push:
     new 9f6b462cd HDDS-14626. [Docs] [System Internals] OzoneManager Disk 
Layout. (#336)
9f6b462cd is described below

commit 9f6b462cdde41d28c72c2b82d4b4442e8c3185e3
Author: Uma Maheswara Rao G <[email protected]>
AuthorDate: Thu Feb 12 14:51:17 2026 -0800

    HDDS-14626. [Docs] [System Internals] OzoneManager Disk Layout. (#336)
---
 .../01-ozone-manager/01-disk-layout.md             | 102 ++++++++++++++++++++-
 1 file changed, 101 insertions(+), 1 deletion(-)

diff --git 
a/docs/07-system-internals/01-components/01-ozone-manager/01-disk-layout.md 
b/docs/07-system-internals/01-components/01-ozone-manager/01-disk-layout.md
index ec034d6a4..641f38a67 100644
--- a/docs/07-system-internals/01-components/01-ozone-manager/01-disk-layout.md
+++ b/docs/07-system-internals/01-components/01-ozone-manager/01-disk-layout.md
@@ -3,4 +3,104 @@ sidebar_label: Disk Layout
 ---
 # Ozone Manager Disk Layout
 
-**TODO:** File a subtask under 
[HDDS-9862](https://issues.apache.org/jira/browse/HDDS-9862) and complete this 
page or section.
+## **Overview**
+
+Apache Ozone separates metadata management across different services to ensure 
scalability. The Ozone Manager (OM) is responsible for managing the namespace 
metadata, which includes volumes, buckets, and keys. This document outlines the 
on-disk configurations, directory structures, and layout for the Ozone Manager 
based on technical specifications.
+
+## **Core Metadata Configurations**
+
+The following configuration keys define where the Ozone Manager stores its 
persistent data. For production environments, it is recommended to host these 
directories on NVMe/SSDs to ensure high performance.
+
+- **`ozone.om.db.dirs`**: Specifies the dedicated location for the Ozone 
Manager RocksDB.  
+- **`ozone.metadata.dirs`**: Serves as the default location for 
security-related metadata (keys and certificates) and is often used as a 
fallback if specific DB directories are not defined.  
+- **`ozone.om.ratis.storage.dir`**: Defines the storage location for Ratis 
(Raft) logs, which are essential for Ozone Manager High Availability (HA).
+
+## **On-Disk Directory Structure**
+
+A typical Ozone Manager metadata directory (e.g., `/var/lib/hadoop-ozone/om/`) 
is organized as follows:
+
+```text
+/var/lib/hadoop-ozone/om
+├── data                              # The path configured for 
ozone.om.db.dirs
+│   ├── db.checkpoints/               # Point-in-time snapshots of OM DB for 
external tools (e.g., Recon)
+│   ├── om/
+│   │   └── current/
+│   │       └── VERSION               # Metadata identifying clusterID, 
omUuid, and layout version
+│   ├── om.db/                        # The active RocksDB instance (Namespace 
State)
+│   │   ├── *.sst                     # Static Sorted Table files (immutable 
data blocks)
+│   │   ├── *.log                     # RocksDB Write-Ahead Logs (WAL)
+│   │   ├── MANIFEST                  # Records changes to the RocksDB 
structure
+│   │   ├── CURRENT                   # Pointer to the latest manifest file
+│   │   └── snapshots/                # Internal Ozone Bucket Snapshots
+│   │       ├── om.db-<name1>/        # Hard-linked checkpoint for Snapshot 1
+│   │       │   ├── *.sst
+│   │       │   ├── CURRENT
+│   │       │   └── MANIFEST
+│   │       └── om.db-<name2>/        # Hard-linked checkpoint for Snapshot 2
+│   └── omMetrics                     # OM-specific performance and 
operational metrics
+├── ozone-metadata                    # The path configured for 
ozone.metadata.dirs
+│   └── om/
+│       ├── certs/                    # Public certificates and SCM CA chain
+│       └── keys/                     # RSA Private and Public PEM keys
+└── ratis/                            # Raft replication logs (HA). The path 
configured for ozone.om.ratis.storage.dir
+    ├── <group-uuid>/                 # Ratis Pipeline Group ID
+    │   ├── current/
+    │   │   ├── log_inprogress_<id>   # Active transaction logs
+    │   │   └── raft-meta.conf        # Ratis membership and configuration 
metadata
+    └── snapshot/                     # Ratis State Machine snapshots (for 
node synchronization)
+```
+
+## **Detailed Component Breakdown**
+
+### **1\. RocksDB (`om.db`)**
+
+The `om.db` directory contains the RocksDB state store. This is the "source of 
truth" for the namespace. It stores information about every volume, bucket, and 
key in the system. Because Ozone handles billions of objects, the performance 
of the underlying storage for this directory is critical.
+
+In Ozone, snapshots are implemented as RocksDB Checkpoints. When a bucket 
snapshot is created, OM creates a directory under snapshots/ and uses hard 
links for existing `.sst` files. This allows snapshots to be created nearly 
instantaneously with zero initial storage overhead.
+
+### **2\. The VERSION File**
+
+The VERSION file is created during the `om --init` process. It ensures that 
the OM belongs to the correct cluster and identifies the layout version of the 
data to prevent software version mismatches.
+
+Key fields include:
+
+- **nodeType**: Set to OM.  
+- **clusterID**: The unique ID of the Ozone cluster.  
+- **omUuid**: The unique identifier for this specific OM instance.  
+- **layoutVersion**: The software-specific data layout version.
+
+### **3\. Security Metadata**
+
+Located under `ozone.metadata.dirs`, the security sub-directory contains the 
identity of the OM.
+
+- **keys/**: Contains private.pem and public.pem.  
+- **certs/**: Contains the OM’s certificate (signed by the SCM) and the SCM CA 
certificate. These are required for secure communication (mTLS) within the 
cluster.
+
+### **4\. Ratis Logs**
+
+The `ratis/` directory is critical for HA clusters. It stores the transaction 
logs that must be replicated across the OM quorum.
+
+- **Logs**: Represent "inflight" transactions that haven't been fully 
compacted into the state machine.
+- **Snapshots**: These are Ratis snapshots (not bucket snapshots). They allow 
a new or lagging OM node to recover its state quickly without replaying the 
entire history of logs.
+
+### **5\. `db.checkpoints`**
+
+The db.checkpoints directory serves as a dedicated storage area for temporary, 
read-only snapshots of the active OM RocksDB. These checkpoints are primarily 
utilized by management and observability tools, such as Ozone Recon, to perform 
out-of-band analysis without impacting the performance of the live namespace. 
Unlike the user-facing bucket snapshots that reside within the `om.db` 
hierarchy, these checkpoints are typically short-lived and represent a full 
point-in-time state of the ent [...]
+
+### Recommended Storage Configuration Mapping
+
+| Path Description | Configuration Key | Storage Type Recommendation | Purpose 
|
+| :--- | :--- | :--- | :--- |
+| **OM Metadata Database** | `ozone.om.db.dirs` | **NVMe (strongly 
recommended)** | Stores the OM RocksDB database containing volumes, buckets, 
keys, and namespace state. This is the most latency-sensitive component in 
Ozone. Poor IOPS directly impacts client operation latency (PUT/DELETE/LIST). |
+| **OM Ratis Logs** | `ozone.om.ratis.storage.dir` | **NVMe or very fast SSD** 
| Holds the Raft write-ahead log (WAL) for OM consensus. Every metadata 
mutation must be fsynced before commit. Slow disks increase write latency 
across the entire cluster because clients wait for quorum commit. |
+| **General Ozone Metadata / Security Material** | `ozone.metadata.dirs` | 
**SSD preferred (HDD acceptable for small clusters)** | Base directory used by 
multiple Ozone services for certificates, keys, SCM/OM local metadata, and 
service state. Not on the critical write path like RocksDB or Ratis, but must 
be reliable and persistent. |
+
+## **Layout Implementation for Different Environments**
+
+### **Development/Test Environments**
+
+For simplicity, a single "All-in-One" location can be used by setting 
`ozone.metadata.dir`. All services (OM, SCM, DN) will store their metadata in 
sub-folders under this single path.
+
+### **Production Environments**
+
+It is strictly recommended to separate these directories. The `om.db` and 
Ratis logs should reside on high-IOPS storage (SSDs/NVMe) to minimize latency 
for namespace operations, while security certificates can remain on standard 
persistent storage.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to