This is an automated email from the ASF dual-hosted git repository.

xuanwo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/opendal.git


The following commit(s) were added to refs/heads/main by this push:
     new 620643ac7 RFC-6370: foyer layer (#6370)
620643ac7 is described below

commit 620643ac71934360da443d9d9f8604fe3577fe67
Author: Croxx <[email protected]>
AuthorDate: Mon Jul 7 21:47:05 2025 +0800

    RFC-6370: foyer layer (#6370)
    
    * RFC: foyer layer
    
    Signed-off-by: MrCroxx <[email protected]>
    
    * chore: fix typos
    
    Signed-off-by: MrCroxx <[email protected]>
    
    * Update core/src/docs/rfcs/0000_foyer_integration.md
    
    ---------
    
    Signed-off-by: MrCroxx <[email protected]>
    Co-authored-by: Xuanwo <[email protected]>
---
 core/src/docs/rfcs/0000_foyer_integration.md | 111 +++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/core/src/docs/rfcs/0000_foyer_integration.md 
b/core/src/docs/rfcs/0000_foyer_integration.md
new file mode 100644
index 000000000..ba53d83e1
--- /dev/null
+++ b/core/src/docs/rfcs/0000_foyer_integration.md
@@ -0,0 +1,111 @@
+- Proposal Name: `foyer_integration`
+- Start Date: 2025-07-07
+- RFC PR: [apache/opendal#6370](https://github.com/apache/opendal/pull/6370)
+- Tracking Issue: 
[apache/opendal#6372](https://github.com/apache/opendal/issues/6372)
+
+# Summary
+
+Integrate [*foyer*](https://github.com/foyer-rs/foyer) hybrid support into 
OpenDAL for performance boost and cost reduction.
+
+# Motivation
+
+Object storage is the most commonly used option by OpenDAL users. In 
cloud-based Object Storage services like AWS S3 / GCS, the distribution of 
request latency is often one to several orders of magnitude higher than local 
disks or memory, and these services are often billed based on the number of 
requests. 
+
+Applications based on these cloud object storage services often need to 
introduce caching to optimize storage performance while reducing request 
overhead. *Foyer* provides a mixed caching capability of memory and disk, 
offering a better balance between performance and cost, thus becoming a 
dependency for many systems based on cloud object storage along with OpenDAL. 
e.g. RisingWave, SlateDB, etc.
+
+However, regardless of which cache component is introduced, users need to 
operate additional cache-related APIs apart from operating OpenDAL. If *foyer* 
can be integrated as an optional component into OpenDAL, it can provide users 
with a more friendly, convenient, and transparent interaction method.
+
+By introducing *foyer* integration, the users will be benefited in to 
following aspects:
+
+- Performance boost and cost reduction by caching with both memory and disk.
+- A completely transparent implementation, using the same operation APIs as 
before.
+
+[RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a 
general cache layer design, and *foyer* can be integrated into OpenDAL as a 
general cache in this way. However, this may not fully leverage *foyer*'s 
capabilities:
+
+- *Foyer* support automatic cache refilling on cache miss. The behavior 
differs based on the reason of cache miss and the statistics of the entry (e.g. 
entry not in cache, disk operation throttled, age of entry, etc). All of the 
abilities are supported by a non-standard API `fetch()`, which other cache 
libraries don't have.
+- *Foyer* support requests deduplication on the same key. *Foyer* ensures that 
for concurrent access to the same key, only one request will actually access 
the disk cache or remote storage, while other requests will wait for this 
request to return and directly reuse the result, in order to minimize overhead 
as much as possible.
+
+These capabilities overlap with some of the functionalities provided by a 
general cache Layer, while others are orthogonal. An independent *foyer* 
integration (e.g. `FoyerLayer`) can fully leverage Foyer's capabilities. At the 
same time, this will not affect future integration with Foyer and other cache 
libraries through the general cache layer.
+
+# Guide-level explanation
+
+## 1. Enable feature
+
+```toml
+opendal = { version = "*", features = ["layers-foyer"] }
+```
+
+## 2. Build foyer instance
+
+```rust
+let cache = HybridCacheBuilder::new()
+    .memory(10)
+    .with_shards(1)
+    .storage(Engine::Large(LargeEngineOptions::new()))
+    .with_device_options(
+        DirectFsDeviceOptions::new(dir.path())
+            .with_capacity(16 * MiB as usize)
+            .with_file_size(1 * MiB as usize),
+    )
+    .with_recover_mode(RecoverMode::None)
+    .build()
+    .await
+    .unwrap();
+```
+
+## 3. Build OpenDAL operator with foyer layer
+
+```rust
+let op = Operator::new(Dashmap::default())
+    .unwrap()
+    .layer(FoyerLayer::new(cache.clone()))
+    .finish();
+```
+
+## 4. Perform operations as you used to
+
+```rust
+op.write("obj-1").await.unwrap();
+
+assert_eq!(op.list("/").await.unwrap().len(), 1);
+
+op.read("obj-1").await.unwrap();
+
+op.delete("obj-1").await.unwrap();
+
+assert!(op.list("/").await.unwrap().is_empty());
+```
+
+# Reference-level explanation
+
+As mentioned in the previous section, this RFC aims to integrate *foyer* to 
fully leverage its capabilities, rather than designing a generic cache layer. 
Therefore, a transparent integration can be achieved through a `FoyerLayer`.
+
+`FoyerLayer` holds both the reference of both internal accessor, and a *foyer* 
instance. For operations supported by *foyer* and compatible in behavior, the 
`FoyerLayer` will use *foyer* to handle requests, accessing the internal 
accessor as needed. For operations that *foyer* cannot support, it will 
automatically fallback to using the internal accessor implementation.
+
+Here are the details of operations that involve *foyer* operation:
+
+- `read`: Read from *foyer* hybrid cache, if the hybrid cache misses, fallback 
to internal accessor `read` operation.
+    - For range get, *foyer* caches and fetches the whole object and returns 
the requested object range. (In future versions, it may be possible to support 
user configuration for whether to cache the entire object or only the objects 
covered by the range.)
+- `write`: Insert hybrid cache on internal accessor `write` operation success.
+- `delete`: Delete object from *foyer* hybrid cache regardless of internal 
accessor `delete` operation success.
+
+# Drawbacks
+
+Since we cannot perceive whether other users have updated the data in the 
underlying storage system, introducing a cache in this case may lead to data 
inconsistency. Therefore, the integration of Foyer is more suitable for object 
storage systems that do not support updating objects.
+
+# Rationale and alternatives
+
+[RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a 
general cache layer design, but cannot fully leverage *foyer*'s capabilities. 
However, the two are not in conflict.  At the same time, because #6297 has not 
yet been finalized, I prefer to implement a layer specifically for the *foyer* 
first. This does not affect the future implementation of a general cache layer 
and can also help quickly identify potential user needs and issues.
+
+# Prior art
+
+*Foyer* has already been applied in systems like RisingWave, ChromaDB, and 
SlateDB. We can learn from this experience. Notably, both RisingWave and 
SlateDB support using OpenDAL as the data access layer. This RFC will provide a 
smoother experience for users with similar needs.
+
+# Unresolved questions
+
+None
+
+# Future possibilities
+
+- Based on the experience of implementing the *foyer* layer, a more general 
cache layer can be developed.
+- Adjust the API of *foyer* to align with the usage of OpenDAL, enhancing 
compatibility between the two.

Reply via email to