ivandika3 commented on code in PR #407: URL: https://github.com/apache/ozone-site/pull/407#discussion_r3214197051
########## docs/07-system-internals/02-data-operations/02-read.md: ########## @@ -3,27 +3,118 @@ draft: true sidebar_label: Read --- -# Implementation of Read Operations - -**TODO:** File a subtask under [HDDS-9862](https://issues.apache.org/jira/browse/HDDS-9862) and complete this page or section. - -## Reading Metadata - -## Reading Data - -Trace every part of a read request from beginning to end. This includes: - -- Client getting encryption keys -- Client calling OM to create key -- OM validating client's Kerberos principal -- OM checking permissions (Ranger or Native ACLs) -- OM generating block tokens from the shared secret previously retrieved from SCM -- OM getting block locations from SCM or from its cache. -- OM returning container, blocks, pipeline, block tokens -- Client sending block tokens and Datanode validating based on the shared secret from SCM -- Client sending read chunk requests to Datanode to fetch the data. - - For replication: - - Include topology choices of which Datanodes to use - - Include failover handling - - For EC, link to the [EC feature page](../features/erasure-coding). -- Client validating checksums +# Apache Ozone Internals: Read Operation Implementation Guide + +This guide provides a comprehensive trace of a read request in Apache Ozone, including metadata resolution, security (Block Tokens), Transparent Data Encryption (TDE), and Authorization (Ranger/Native ACLs). + +--- + +## 1. Phase 1: Request & Authorization (Client & OM) + +### 1.1 Initiating the Request + +The application calls `OzoneBucket.readKey(key)`. The client sends a `lookupKey` RPC to the Ozone Manager (OM). + +### 1.2 OM: Authorization Check + +Before processing the lookup, the OM must authorize the user. + +1. **Entry Point:** `OmMetadataReader.checkAcls()` is called within the `lookupKey` flow. +2. **Authorizer Selection:** Based on configuration (`ozone.acl.authorizer.class`), OM uses either: + - **Native Authorizer:** Uses Ozone's internal ACLs stored in RocksDB. + - **Apache Ranger Authorizer:** Delegates the decision to the Ranger Ozone Plugin (`RangerOzoneAuthorizer`). +3. **Authorization Logic:** + - OM builds an `OzoneObj` (Volume/Bucket/Key) and a `RequestContext` (User, IP, Action: READ). + - **Ranger Flow:** The plugin checks its local cache of policies (periodically synced from the Ranger Admin server). If a policy allows READ for the user/group on that resource, access is granted. + - **Fallback:** If Ranger is disabled or the Native authorizer is used, OM checks the object's ACL list for matching user/group permissions. + +### 1.3 OM: Key & Encryption Resolution + +Once authorized: + +1. **Key Lookup:** OM finds the `OmKeyInfo` in the `keyTable`. +2. **Encryption Check:** If TDE is enabled, the `OmKeyInfo` contains the `EDEK` (Encrypted Data Encryption Key) and the EZ Key Name. +3. **Block Allocation:** OM retrieves the `OmKeyLocationInfo` (Block IDs and Pipelines). Review Comment: "Block Allocation" seem awkwards, use "Block Retrieval" instead. Also might want to mention the container location cache implementation as well as the locality aware implementation (sorting datanodes with the Network Topology cached in OM from SCM) ########## docs/07-system-internals/02-data-operations/02-read.md: ########## @@ -3,27 +3,118 @@ draft: true sidebar_label: Read --- -# Implementation of Read Operations - -**TODO:** File a subtask under [HDDS-9862](https://issues.apache.org/jira/browse/HDDS-9862) and complete this page or section. - -## Reading Metadata - -## Reading Data - -Trace every part of a read request from beginning to end. This includes: - -- Client getting encryption keys -- Client calling OM to create key -- OM validating client's Kerberos principal -- OM checking permissions (Ranger or Native ACLs) -- OM generating block tokens from the shared secret previously retrieved from SCM -- OM getting block locations from SCM or from its cache. -- OM returning container, blocks, pipeline, block tokens -- Client sending block tokens and Datanode validating based on the shared secret from SCM -- Client sending read chunk requests to Datanode to fetch the data. - - For replication: - - Include topology choices of which Datanodes to use - - Include failover handling - - For EC, link to the [EC feature page](../features/erasure-coding). -- Client validating checksums +# Apache Ozone Internals: Read Operation Implementation Guide + +This guide provides a comprehensive trace of a read request in Apache Ozone, including metadata resolution, security (Block Tokens), Transparent Data Encryption (TDE), and Authorization (Ranger/Native ACLs). + +--- + +## 1. Phase 1: Request & Authorization (Client & OM) + +### 1.1 Initiating the Request + +The application calls `OzoneBucket.readKey(key)`. The client sends a `lookupKey` RPC to the Ozone Manager (OM). Review Comment: Use `getKeyInfo` instead, `lookupKey` is already deprecated. ########## docs/07-system-internals/02-data-operations/02-read.md: ########## @@ -3,27 +3,118 @@ draft: true sidebar_label: Read --- -# Implementation of Read Operations - -**TODO:** File a subtask under [HDDS-9862](https://issues.apache.org/jira/browse/HDDS-9862) and complete this page or section. - -## Reading Metadata - -## Reading Data - -Trace every part of a read request from beginning to end. This includes: - -- Client getting encryption keys -- Client calling OM to create key -- OM validating client's Kerberos principal -- OM checking permissions (Ranger or Native ACLs) -- OM generating block tokens from the shared secret previously retrieved from SCM -- OM getting block locations from SCM or from its cache. -- OM returning container, blocks, pipeline, block tokens -- Client sending block tokens and Datanode validating based on the shared secret from SCM -- Client sending read chunk requests to Datanode to fetch the data. - - For replication: - - Include topology choices of which Datanodes to use - - Include failover handling - - For EC, link to the [EC feature page](../features/erasure-coding). -- Client validating checksums +# Apache Ozone Internals: Read Operation Implementation Guide + +This guide provides a comprehensive trace of a read request in Apache Ozone, including metadata resolution, security (Block Tokens), Transparent Data Encryption (TDE), and Authorization (Ranger/Native ACLs). + +--- + +## 1. Phase 1: Request & Authorization (Client & OM) + +### 1.1 Initiating the Request + +The application calls `OzoneBucket.readKey(key)`. The client sends a `lookupKey` RPC to the Ozone Manager (OM). + +### 1.2 OM: Authorization Check + +Before processing the lookup, the OM must authorize the user. + +1. **Entry Point:** `OmMetadataReader.checkAcls()` is called within the `lookupKey` flow. +2. **Authorizer Selection:** Based on configuration (`ozone.acl.authorizer.class`), OM uses either: + - **Native Authorizer:** Uses Ozone's internal ACLs stored in RocksDB. + - **Apache Ranger Authorizer:** Delegates the decision to the Ranger Ozone Plugin (`RangerOzoneAuthorizer`). +3. **Authorization Logic:** + - OM builds an `OzoneObj` (Volume/Bucket/Key) and a `RequestContext` (User, IP, Action: READ). + - **Ranger Flow:** The plugin checks its local cache of policies (periodically synced from the Ranger Admin server). If a policy allows READ for the user/group on that resource, access is granted. + - **Fallback:** If Ranger is disabled or the Native authorizer is used, OM checks the object's ACL list for matching user/group permissions. + +### 1.3 OM: Key & Encryption Resolution + +Once authorized: + +1. **Key Lookup:** OM finds the `OmKeyInfo` in the `keyTable`. +2. **Encryption Check:** If TDE is enabled, the `OmKeyInfo` contains the `EDEK` (Encrypted Data Encryption Key) and the EZ Key Name. +3. **Block Allocation:** OM retrieves the `OmKeyLocationInfo` (Block IDs and Pipelines). +4. **Block Token Generation:** OM generates a signed Block Token for each block using secret keys managed by the SCM. + +OM returns `OmKeyInfo` (Metadata + `EDEK` + Block Tokens) to the client. + +--- + +## 2. Phase 2: Decryption Setup (Client & KMS) + +### 2.1 Decrypting the `EDEK` + +If the key is encrypted: + +1. **KMS Request:** The client sends the `EDEK` to the KMS (Key Management Server). +2. **KMS Authorization:** The KMS also performs an authorization check (often via Ranger KMS plugin) to ensure the user can use the EZ Key for decryption. +3. **`DEK` Retrieval:** KMS returns the raw `DEK` (Data Encryption Key) to the client. + +### 2.2 Initializing the Crypto Stream + +The client wraps the data stream in a `CryptoInputStream` initialized with the raw `DEK` and the IV from the metadata. + +--- + +## 3. Phase 3: Data Retrieval (Client & Datanode) + +### 3.1 Fetching Encrypted Chunks + +The client's `ChunkInputStream` sends a `ReadChunk` request to a Datanode. + +- **Security:** The request includes the Block Token. +- **Datanode Validation:** The Datanode verifies the token's signature using the Secret Keys it fetched from the SCM. This is the final "at-the-edge" authorization check. +- **Data Transfer:** The Datanode reads the encrypted data from disk and streams it back. Review Comment: Need to mention ReadBlock as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
