This is an automated email from the ASF dual-hosted git repository.

quantranhong1999 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/james-project.git

commit dfe6a2041838785aea753384452cef9cf5fd08ff
Author: Quan Tran <[email protected]>
AuthorDate: Wed May 6 16:49:00 2026 +0700

    JAMES-4182 Add documentation explains blob store design
---
 docs/modules/servers/nav.adoc                      |   2 +
 .../pages/distributed/architecture/blobstore.adoc  |   4 +
 .../pages/postgres/architecture/blobstore.adoc     |   4 +
 .../servers/partials/architecture/blobstore.adoc   | 102 +++++++++++++++++++++
 .../servers/partials/architecture/index.adoc       |   2 +
 5 files changed, 114 insertions(+)

diff --git a/docs/modules/servers/nav.adoc b/docs/modules/servers/nav.adoc
index b0030e7870..1e619f3879 100644
--- a/docs/modules/servers/nav.adoc
+++ b/docs/modules/servers/nav.adoc
@@ -14,6 +14,7 @@
 *** xref:distributed/architecture/index.adoc[]
 **** xref:distributed/architecture/implemented-standards.adoc[]
 **** xref:distributed/architecture/consistency-model.adoc[]
+**** xref:distributed/architecture/blobstore.adoc[]
 **** xref:distributed/architecture/specialized-instances.adoc[]
 **** xref:distributed/architecture/data-tiering.adoc[]
 *** xref:distributed/run/index.adoc[Run]
@@ -88,6 +89,7 @@
 *** xref:postgres/architecture/index.adoc[]
 **** xref:postgres/architecture/implemented-standards.adoc[]
 **** xref:postgres/architecture/consistency-model.adoc[]
+**** xref:postgres/architecture/blobstore.adoc[]
 **** xref:postgres/architecture/specialized-instances.adoc[]
 *** xref:postgres/run/index.adoc[]
 **** xref:postgres/run/run-java.adoc[Run with Java]
diff --git a/docs/modules/servers/pages/distributed/architecture/blobstore.adoc 
b/docs/modules/servers/pages/distributed/architecture/blobstore.adoc
new file mode 100644
index 0000000000..ea6c3469dc
--- /dev/null
+++ b/docs/modules/servers/pages/distributed/architecture/blobstore.adoc
@@ -0,0 +1,4 @@
+= Distributed James Server &mdash; BlobStore
+:navtitle: BlobStore
+
+include::partial$architecture/blobstore.adoc[]
diff --git a/docs/modules/servers/pages/postgres/architecture/blobstore.adoc 
b/docs/modules/servers/pages/postgres/architecture/blobstore.adoc
new file mode 100644
index 0000000000..db1e165838
--- /dev/null
+++ b/docs/modules/servers/pages/postgres/architecture/blobstore.adoc
@@ -0,0 +1,4 @@
+= Postgresql James server &mdash; BlobStore
+:navtitle: BlobStore
+
+include::partial$architecture/blobstore.adoc[]
diff --git a/docs/modules/servers/partials/architecture/blobstore.adoc 
b/docs/modules/servers/partials/architecture/blobstore.adoc
new file mode 100644
index 0000000000..dca52fc490
--- /dev/null
+++ b/docs/modules/servers/partials/architecture/blobstore.adoc
@@ -0,0 +1,102 @@
+James stores large, non-indexable binary payloads in a BlobStore. Typical 
examples
+are message bodies, attachments, deleted messages retained by the vault, and 
mail
+queue payloads.
+
+Mailbox, Mail Queue, and Deleted Messages Vault components rely on it.
+
+Server components usually depend on the higher-level `BlobStore`. 
`BlobStoreDAO`
+is the lower-level virtual storage abstraction implemented by concrete storage
+connectors such as memory, file, Cassandra, Postgres, and S3 compatible object
+stores. It allows James to compose storage features such as encryption or
+compression independently of the storage connector.
+
+== Abstraction layers
+
+Most James components use `BlobStore`, which is responsible for saving content
+and returning a `BlobId`. `BlobStoreDAO` is the lower-level persistence 
contract:
+it stores, reads, lists, and deletes blobs for a given `BucketName` and 
`BlobId`.
+
+Cross-cutting storage features can be composed around this DAO contract. For
+example, deduplication decides blob identifiers at the `BlobStore` level, while
+wrappers such as compression or encryption can transform payloads and metadata
+before delegating to the concrete storage connector.
+
+=== BlobStore implementations
+
+James composes several behaviors at the `BlobStore` level:
+
+* `PassThroughBlobStore` is the non-deduplicating strategy. It generates a new
+  `BlobId` for each save, delegates persistence to the configured
+  `BlobStoreDAO`, and deletes blobs directly.
+* `DeDuplicationBlobStore` is the deduplicating strategy. It derives `BlobId`
+  values from content hashes, so identical content can share the same stored
+  blob. A single delete does not remove the underlying blob immediately; 
garbage
+  collection is responsible for eventually removing unreferenced blobs.
+* `MetricableBlobStore` decorates another `BlobStore` with timing metrics.
+* `CachedBlobStore` decorates another `BlobStore` with a Cassandra-backed cache
+  for small, frequently read blobs.
+
+=== BlobStoreDAO implementations
+
+Concrete `BlobStoreDAO` implementations persist payloads in a storage backend,
+for example memory, file, Cassandra, Postgres, or S3 compatible object storage.
+
+Some `BlobStoreDAO` implementations are wrappers rather than final storage
+connectors:
+
+* `AESBlobStoreDAO` encrypts payload bytes before delegating writes to the
+  underlying DAO, and decrypts them transparently on reads. This protects blob
+  content at rest, especially when James stores blobs in third-party object
+  storage.
+* `ZstdBlobStoreDAO` can compress payload bytes before delegating writes to the
+  underlying DAO. When it stores compressed bytes, it records metadata such as
+  `content-encoding` and the original size. On reads, it uses this metadata to
+  transparently decompress the payload. This reduces storage usage and network
+  transfer for compressible blob content.
+
+AES and Zstd can be enabled together. In the Guice binding chain, compression
+wraps encryption: `ZstdBlobStoreDAO` delegates to `AESBlobStoreDAO`, which then
+delegates to the concrete storage DAO. Writes therefore compress first and
+encrypt afterwards; reads decrypt first and decompress afterwards. This 
ordering
+preserves the benefit of compression, as encrypted payloads are generally not
+compressible.
+
+== Logical buckets
+
+`BucketName` is a James logical namespace in the `BlobStoreDAO` contract. It is
+not an AWS S3 bucket name, even when the selected connector stores data in an 
S3
+compatible object store.
+
+Each connector maps this logical namespace to its own storage model. Depending
+on the implementation and configuration, a logical bucket can be stored as a
+directory, an object-storage bucket, a database partition, or another
+connector-specific representation. Code using `BlobStoreDAO` should only rely 
on
+the James `BucketName` abstraction.
+
+== Metadata
+
+Blob metadata stores side information needed to interpret a blob payload 
without
+changing the payload bytes or blob identifier. One use case is object store
+compression: James uses a marker such as `content-encoding` to detect a
+compressed payload and transparently decompress it when reading.
+
+James uses a hybrid metadata model:
+
+* Metadata actively interpreted by James should expose typed helpers or 
constants
+  in the API. For example, `BlobMetadata.contentEncoding()` reads the
+  `content-encoding` entry.
+* Other metadata stays available through the underlying map as an extension 
point
+  for James library users and custom storage implementations.
+  Custom storage metadata could be used by James library users or storage 
extensions to implement other use cases.
+
+Metadata-aware storage implementations and wrappers should preserve unknown
+metadata entries.
+
+=== Metadata names
+
+`BlobMetadataName` defines the portable metadata key convention:
+
+* names are case-insensitive and are canonicalized to lowercase;
+* names must be non-empty;
+* names must be shorter than 128 characters;
+* names can contain only ASCII letters, digits, and `-`.
diff --git a/docs/modules/servers/partials/architecture/index.adoc 
b/docs/modules/servers/partials/architecture/index.adoc
index 449a31c99e..634004753c 100644
--- a/docs/modules/servers/partials/architecture/index.adoc
+++ b/docs/modules/servers/partials/architecture/index.adoc
@@ -279,6 +279,8 @@ the same content will be stored one once.
 The downside is that deletion is more complicated, and a garbage collection 
needs to be run. A first implementation
 based on bloom filters can be used and triggered using the WebAdmin REST API.
 
+See xref:{xref-base}/architecture/blobstore.adoc[BlobStore architecture page] 
for more details.
+
 === Task Manager
 
 Allows to control and schedule long running tasks run by other


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to