This is an automated email from the ASF dual-hosted git repository. quantranhong1999 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/james-project.git
commit dfe6a2041838785aea753384452cef9cf5fd08ff Author: Quan Tran <[email protected]> AuthorDate: Wed May 6 16:49:00 2026 +0700 JAMES-4182 Add documentation explains blob store design --- docs/modules/servers/nav.adoc | 2 + .../pages/distributed/architecture/blobstore.adoc | 4 + .../pages/postgres/architecture/blobstore.adoc | 4 + .../servers/partials/architecture/blobstore.adoc | 102 +++++++++++++++++++++ .../servers/partials/architecture/index.adoc | 2 + 5 files changed, 114 insertions(+) diff --git a/docs/modules/servers/nav.adoc b/docs/modules/servers/nav.adoc index b0030e7870..1e619f3879 100644 --- a/docs/modules/servers/nav.adoc +++ b/docs/modules/servers/nav.adoc @@ -14,6 +14,7 @@ *** xref:distributed/architecture/index.adoc[] **** xref:distributed/architecture/implemented-standards.adoc[] **** xref:distributed/architecture/consistency-model.adoc[] +**** xref:distributed/architecture/blobstore.adoc[] **** xref:distributed/architecture/specialized-instances.adoc[] **** xref:distributed/architecture/data-tiering.adoc[] *** xref:distributed/run/index.adoc[Run] @@ -88,6 +89,7 @@ *** xref:postgres/architecture/index.adoc[] **** xref:postgres/architecture/implemented-standards.adoc[] **** xref:postgres/architecture/consistency-model.adoc[] +**** xref:postgres/architecture/blobstore.adoc[] **** xref:postgres/architecture/specialized-instances.adoc[] *** xref:postgres/run/index.adoc[] **** xref:postgres/run/run-java.adoc[Run with Java] diff --git a/docs/modules/servers/pages/distributed/architecture/blobstore.adoc b/docs/modules/servers/pages/distributed/architecture/blobstore.adoc new file mode 100644 index 0000000000..ea6c3469dc --- /dev/null +++ b/docs/modules/servers/pages/distributed/architecture/blobstore.adoc @@ -0,0 +1,4 @@ += Distributed James Server — BlobStore +:navtitle: BlobStore + +include::partial$architecture/blobstore.adoc[] diff --git a/docs/modules/servers/pages/postgres/architecture/blobstore.adoc b/docs/modules/servers/pages/postgres/architecture/blobstore.adoc new file mode 100644 index 0000000000..db1e165838 --- /dev/null +++ b/docs/modules/servers/pages/postgres/architecture/blobstore.adoc @@ -0,0 +1,4 @@ += Postgresql James server — BlobStore +:navtitle: BlobStore + +include::partial$architecture/blobstore.adoc[] diff --git a/docs/modules/servers/partials/architecture/blobstore.adoc b/docs/modules/servers/partials/architecture/blobstore.adoc new file mode 100644 index 0000000000..dca52fc490 --- /dev/null +++ b/docs/modules/servers/partials/architecture/blobstore.adoc @@ -0,0 +1,102 @@ +James stores large, non-indexable binary payloads in a BlobStore. Typical examples +are message bodies, attachments, deleted messages retained by the vault, and mail +queue payloads. + +Mailbox, Mail Queue, and Deleted Messages Vault components rely on it. + +Server components usually depend on the higher-level `BlobStore`. `BlobStoreDAO` +is the lower-level virtual storage abstraction implemented by concrete storage +connectors such as memory, file, Cassandra, Postgres, and S3 compatible object +stores. It allows James to compose storage features such as encryption or +compression independently of the storage connector. + +== Abstraction layers + +Most James components use `BlobStore`, which is responsible for saving content +and returning a `BlobId`. `BlobStoreDAO` is the lower-level persistence contract: +it stores, reads, lists, and deletes blobs for a given `BucketName` and `BlobId`. + +Cross-cutting storage features can be composed around this DAO contract. For +example, deduplication decides blob identifiers at the `BlobStore` level, while +wrappers such as compression or encryption can transform payloads and metadata +before delegating to the concrete storage connector. + +=== BlobStore implementations + +James composes several behaviors at the `BlobStore` level: + +* `PassThroughBlobStore` is the non-deduplicating strategy. It generates a new + `BlobId` for each save, delegates persistence to the configured + `BlobStoreDAO`, and deletes blobs directly. +* `DeDuplicationBlobStore` is the deduplicating strategy. It derives `BlobId` + values from content hashes, so identical content can share the same stored + blob. A single delete does not remove the underlying blob immediately; garbage + collection is responsible for eventually removing unreferenced blobs. +* `MetricableBlobStore` decorates another `BlobStore` with timing metrics. +* `CachedBlobStore` decorates another `BlobStore` with a Cassandra-backed cache + for small, frequently read blobs. + +=== BlobStoreDAO implementations + +Concrete `BlobStoreDAO` implementations persist payloads in a storage backend, +for example memory, file, Cassandra, Postgres, or S3 compatible object storage. + +Some `BlobStoreDAO` implementations are wrappers rather than final storage +connectors: + +* `AESBlobStoreDAO` encrypts payload bytes before delegating writes to the + underlying DAO, and decrypts them transparently on reads. This protects blob + content at rest, especially when James stores blobs in third-party object + storage. +* `ZstdBlobStoreDAO` can compress payload bytes before delegating writes to the + underlying DAO. When it stores compressed bytes, it records metadata such as + `content-encoding` and the original size. On reads, it uses this metadata to + transparently decompress the payload. This reduces storage usage and network + transfer for compressible blob content. + +AES and Zstd can be enabled together. In the Guice binding chain, compression +wraps encryption: `ZstdBlobStoreDAO` delegates to `AESBlobStoreDAO`, which then +delegates to the concrete storage DAO. Writes therefore compress first and +encrypt afterwards; reads decrypt first and decompress afterwards. This ordering +preserves the benefit of compression, as encrypted payloads are generally not +compressible. + +== Logical buckets + +`BucketName` is a James logical namespace in the `BlobStoreDAO` contract. It is +not an AWS S3 bucket name, even when the selected connector stores data in an S3 +compatible object store. + +Each connector maps this logical namespace to its own storage model. Depending +on the implementation and configuration, a logical bucket can be stored as a +directory, an object-storage bucket, a database partition, or another +connector-specific representation. Code using `BlobStoreDAO` should only rely on +the James `BucketName` abstraction. + +== Metadata + +Blob metadata stores side information needed to interpret a blob payload without +changing the payload bytes or blob identifier. One use case is object store +compression: James uses a marker such as `content-encoding` to detect a +compressed payload and transparently decompress it when reading. + +James uses a hybrid metadata model: + +* Metadata actively interpreted by James should expose typed helpers or constants + in the API. For example, `BlobMetadata.contentEncoding()` reads the + `content-encoding` entry. +* Other metadata stays available through the underlying map as an extension point + for James library users and custom storage implementations. + Custom storage metadata could be used by James library users or storage extensions to implement other use cases. + +Metadata-aware storage implementations and wrappers should preserve unknown +metadata entries. + +=== Metadata names + +`BlobMetadataName` defines the portable metadata key convention: + +* names are case-insensitive and are canonicalized to lowercase; +* names must be non-empty; +* names must be shorter than 128 characters; +* names can contain only ASCII letters, digits, and `-`. diff --git a/docs/modules/servers/partials/architecture/index.adoc b/docs/modules/servers/partials/architecture/index.adoc index 449a31c99e..634004753c 100644 --- a/docs/modules/servers/partials/architecture/index.adoc +++ b/docs/modules/servers/partials/architecture/index.adoc @@ -279,6 +279,8 @@ the same content will be stored one once. The downside is that deletion is more complicated, and a garbage collection needs to be run. A first implementation based on bloom filters can be used and triggered using the WebAdmin REST API. +See xref:{xref-base}/architecture/blobstore.adoc[BlobStore architecture page] for more details. + === Task Manager Allows to control and schedule long running tasks run by other --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
