This is an automated email from the ASF dual-hosted git repository.
xuanwo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/opendal.git
The following commit(s) were added to refs/heads/main by this push:
new 620643ac7 RFC-6370: foyer layer (#6370)
620643ac7 is described below
commit 620643ac71934360da443d9d9f8604fe3577fe67
Author: Croxx <[email protected]>
AuthorDate: Mon Jul 7 21:47:05 2025 +0800
RFC-6370: foyer layer (#6370)
* RFC: foyer layer
Signed-off-by: MrCroxx <[email protected]>
* chore: fix typos
Signed-off-by: MrCroxx <[email protected]>
* Update core/src/docs/rfcs/0000_foyer_integration.md
---------
Signed-off-by: MrCroxx <[email protected]>
Co-authored-by: Xuanwo <[email protected]>
---
core/src/docs/rfcs/0000_foyer_integration.md | 111 +++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/core/src/docs/rfcs/0000_foyer_integration.md
b/core/src/docs/rfcs/0000_foyer_integration.md
new file mode 100644
index 000000000..ba53d83e1
--- /dev/null
+++ b/core/src/docs/rfcs/0000_foyer_integration.md
@@ -0,0 +1,111 @@
+- Proposal Name: `foyer_integration`
+- Start Date: 2025-07-07
+- RFC PR: [apache/opendal#6370](https://github.com/apache/opendal/pull/6370)
+- Tracking Issue:
[apache/opendal#6372](https://github.com/apache/opendal/issues/6372)
+
+# Summary
+
+Integrate [*foyer*](https://github.com/foyer-rs/foyer) hybrid support into
OpenDAL for performance boost and cost reduction.
+
+# Motivation
+
+Object storage is the most commonly used option by OpenDAL users. In
cloud-based Object Storage services like AWS S3 / GCS, the distribution of
request latency is often one to several orders of magnitude higher than local
disks or memory, and these services are often billed based on the number of
requests.
+
+Applications based on these cloud object storage services often need to
introduce caching to optimize storage performance while reducing request
overhead. *Foyer* provides a mixed caching capability of memory and disk,
offering a better balance between performance and cost, thus becoming a
dependency for many systems based on cloud object storage along with OpenDAL.
e.g. RisingWave, SlateDB, etc.
+
+However, regardless of which cache component is introduced, users need to
operate additional cache-related APIs apart from operating OpenDAL. If *foyer*
can be integrated as an optional component into OpenDAL, it can provide users
with a more friendly, convenient, and transparent interaction method.
+
+By introducing *foyer* integration, the users will be benefited in to
following aspects:
+
+- Performance boost and cost reduction by caching with both memory and disk.
+- A completely transparent implementation, using the same operation APIs as
before.
+
+[RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a
general cache layer design, and *foyer* can be integrated into OpenDAL as a
general cache in this way. However, this may not fully leverage *foyer*'s
capabilities:
+
+- *Foyer* support automatic cache refilling on cache miss. The behavior
differs based on the reason of cache miss and the statistics of the entry (e.g.
entry not in cache, disk operation throttled, age of entry, etc). All of the
abilities are supported by a non-standard API `fetch()`, which other cache
libraries don't have.
+- *Foyer* support requests deduplication on the same key. *Foyer* ensures that
for concurrent access to the same key, only one request will actually access
the disk cache or remote storage, while other requests will wait for this
request to return and directly reuse the result, in order to minimize overhead
as much as possible.
+
+These capabilities overlap with some of the functionalities provided by a
general cache Layer, while others are orthogonal. An independent *foyer*
integration (e.g. `FoyerLayer`) can fully leverage Foyer's capabilities. At the
same time, this will not affect future integration with Foyer and other cache
libraries through the general cache layer.
+
+# Guide-level explanation
+
+## 1. Enable feature
+
+```toml
+opendal = { version = "*", features = ["layers-foyer"] }
+```
+
+## 2. Build foyer instance
+
+```rust
+let cache = HybridCacheBuilder::new()
+ .memory(10)
+ .with_shards(1)
+ .storage(Engine::Large(LargeEngineOptions::new()))
+ .with_device_options(
+ DirectFsDeviceOptions::new(dir.path())
+ .with_capacity(16 * MiB as usize)
+ .with_file_size(1 * MiB as usize),
+ )
+ .with_recover_mode(RecoverMode::None)
+ .build()
+ .await
+ .unwrap();
+```
+
+## 3. Build OpenDAL operator with foyer layer
+
+```rust
+let op = Operator::new(Dashmap::default())
+ .unwrap()
+ .layer(FoyerLayer::new(cache.clone()))
+ .finish();
+```
+
+## 4. Perform operations as you used to
+
+```rust
+op.write("obj-1").await.unwrap();
+
+assert_eq!(op.list("/").await.unwrap().len(), 1);
+
+op.read("obj-1").await.unwrap();
+
+op.delete("obj-1").await.unwrap();
+
+assert!(op.list("/").await.unwrap().is_empty());
+```
+
+# Reference-level explanation
+
+As mentioned in the previous section, this RFC aims to integrate *foyer* to
fully leverage its capabilities, rather than designing a generic cache layer.
Therefore, a transparent integration can be achieved through a `FoyerLayer`.
+
+`FoyerLayer` holds both the reference of both internal accessor, and a *foyer*
instance. For operations supported by *foyer* and compatible in behavior, the
`FoyerLayer` will use *foyer* to handle requests, accessing the internal
accessor as needed. For operations that *foyer* cannot support, it will
automatically fallback to using the internal accessor implementation.
+
+Here are the details of operations that involve *foyer* operation:
+
+- `read`: Read from *foyer* hybrid cache, if the hybrid cache misses, fallback
to internal accessor `read` operation.
+ - For range get, *foyer* caches and fetches the whole object and returns
the requested object range. (In future versions, it may be possible to support
user configuration for whether to cache the entire object or only the objects
covered by the range.)
+- `write`: Insert hybrid cache on internal accessor `write` operation success.
+- `delete`: Delete object from *foyer* hybrid cache regardless of internal
accessor `delete` operation success.
+
+# Drawbacks
+
+Since we cannot perceive whether other users have updated the data in the
underlying storage system, introducing a cache in this case may lead to data
inconsistency. Therefore, the integration of Foyer is more suitable for object
storage systems that do not support updating objects.
+
+# Rationale and alternatives
+
+[RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a
general cache layer design, but cannot fully leverage *foyer*'s capabilities.
However, the two are not in conflict. At the same time, because #6297 has not
yet been finalized, I prefer to implement a layer specifically for the *foyer*
first. This does not affect the future implementation of a general cache layer
and can also help quickly identify potential user needs and issues.
+
+# Prior art
+
+*Foyer* has already been applied in systems like RisingWave, ChromaDB, and
SlateDB. We can learn from this experience. Notably, both RisingWave and
SlateDB support using OpenDAL as the data access layer. This RFC will provide a
smoother experience for users with similar needs.
+
+# Unresolved questions
+
+None
+
+# Future possibilities
+
+- Based on the experience of implementing the *foyer* layer, a more general
cache layer can be developed.
+- Adjust the API of *foyer* to align with the usage of OpenDAL, enhancing
compatibility between the two.