This is an automated email from the ASF dual-hosted git repository.

xuanwo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-opendal.git


The following commit(s) were added to refs/heads/main by this push:
     new 63cfcf050 RFC-2779: List With Metakey (#2779)
63cfcf050 is described below

commit 63cfcf05081ead1aacd96656f8e0a216d702fed9
Author: Xuanwo <[email protected]>
AuthorDate: Mon Aug 7 14:08:57 2023 +0800

    RFC-2779: List With Metakey (#2779)
    
    * rfc: List With Metakey
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * Rename
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * assign number
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * Fix typo
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * Fix date
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * Fix summary
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * Fix
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * Fix
    
    Signed-off-by: Xuanwo <[email protected]>
    
    * Add more context on how metakey works
    
    Signed-off-by: Xuanwo <[email protected]>
    
    ---------
    
    Signed-off-by: Xuanwo <[email protected]>
---
 core/src/docs/rfcs/2779_list_with_metakey.md | 143 +++++++++++++++++++++++++++
 core/src/docs/rfcs/mod.rs                    |   3 +
 2 files changed, 146 insertions(+)

diff --git a/core/src/docs/rfcs/2779_list_with_metakey.md 
b/core/src/docs/rfcs/2779_list_with_metakey.md
new file mode 100644
index 000000000..783e3cb5b
--- /dev/null
+++ b/core/src/docs/rfcs/2779_list_with_metakey.md
@@ -0,0 +1,143 @@
+- Proposal Name: `list_with_metakey`
+- Start Date: 2023-08-04
+- RFC PR: 
[apache/incubator-opendal#2779](https://github.com/apache/incubator-opendal/pull/2779)
+- Tracking Issue: 
[apache/incubator-opendal#2802](https://github.com/apache/incubator-opendal/issues/2802)
+
+# Summary
+
+Move `Operator` `metadata` API to `list_with().metakey()` to simplify the 
usage.
+
+# Motivation
+
+The current `Entry` metadata API is:
+
+```rust
+use opendal::Entry;
+use opendal::Metakey;
+
+let meta = op
+    .metadata(&entry, Metakey::ContentLength | Metakey::ContentType)
+    .await?;
+```
+
+This API is difficult to understand and rarely used correctly. And in reality, 
users always fetch the same set of metadata during listing.
+
+Take one of our users code as an example:
+
+```rust
+let stream = self
+    .inner
+    .scan(&path)
+    .await
+    .map_err(|err| format_object_store_error(err, &path))?;
+
+let stream = stream.then(|res| async {
+    let entry = res.map_err(|err| format_object_store_error(err, ""))?;
+    let meta = self
+        .inner
+        .metadata(&entry, Metakey::ContentLength | Metakey::LastModified)
+        .await
+        .map_err(|err| format_object_store_error(err, entry.path()))?;
+
+    Ok(format_object_meta(entry.path(), &meta))
+});
+
+Ok(stream.boxed())
+```
+
+By moving metadata to `lister`, our user code can be simplified to:
+
+```rust
+let stream = self
+    .inner
+    .scan_with(&path)
+    .metakey(Metakey::ContentLength | Metakey::LastModified)
+    .await
+    .map_err(|err| format_object_store_error(err, &path))?;
+
+let stream = stream.then(|res| async {
+    let entry = res.map_err(|err| format_object_store_error(err, ""))?;
+    let meta = entry.into_metadata()
+
+    Ok(format_object_meta(entry.path(), &meta))
+});
+
+Ok(stream.boxed())
+```
+
+By introducing this change:
+
+- Users don't need to capture `Operator` in the closure.
+- Users don't need to do async call like `metadata()` again.
+
+If we don't have this change:
+
+- every place that could receive a `fn()` must use `Fn()` instead which 
enforce users to have a generic parameter in their code.
+- It's harder for other languages binding to implement `op.metadata()` right.
+
+# Guide-level explanation
+
+The new API will be:
+
+```rust
+let entries: Vec<Entry> = op
+  .list_with("dir")
+  .metakey(Metakey::ContentLength | Metakey::ContentType).await?;
+
+let meta: &Metadata = entries[0].metadata();
+```
+
+Metadata can be queried directly when listing entries via `metadata()`, and 
later extracted via `into_parts()`.
+
+# Reference-level explanation
+
+## How metakey works
+
+For every services, `stat` will return the full set of it's metadata. For 
example, `s3` will return `ContentLength | ContentType | LastModified | ...`, 
and `fs` will return `ContentLength | LastModified`. And most services will 
return part of those metadata during `list`. `s3` will return `ContentLength`, 
`LastModified`, but `fs` returns none of them.
+
+So when users use `list` to list entries, they will get a list of entries with 
incomplete metadata. The metadata could be in three states:
+
+- Filled: the metadata is returned in `list`
+- NotExist: the metadata is not supported by service.
+- Unknown: the metadata is supported by service but not returned in `list`.
+
+By accept `metakey`, we can compare the returning entry's metadata with 
metakey:
+
+- Return the entry if metakey already met by `Filled` and `NotExist`.
+- Send `stat` call to fetch the metadata if metadata is `Unknown`.
+
+## Changes
+
+We will add `metakey` into `OpList`. Underlying services can use those 
information to try their best to fetch the metadata.
+
+There are following possibilities:
+
+- The entry metadata is met: `Lister` return the entry directly
+- The entry metadata is not met and not fully filled: `Lister` will try to 
send `stat` call to fetch the metadata
+- The entry metadata is not met and fully filled: `Lister` will return the 
entry directly.
+
+To make sure we can handle all metadata correctly, we will add a new 
capability called `stat_complete_metakey`. This capability will be used to 
indicate the complete set of metadata that can be fetched via `stat` call. For 
example, `s3` can set this capability to `ContentLength | ContentType | 
LastModified | ...`, and `fs` only have `ContentLength | LastModified`. 
`Lister` can use this capability to decide whether to send `stat` call or not.
+
+Services' lister implementation should not changed.
+
+# Drawbacks
+
+None
+
+# Rationale and alternatives
+
+Keeping the complex standalone API has limited benefit given low usage.
+
+# Prior art
+
+None
+
+# Unresolved questions
+
+None
+
+# Future possibilities
+
+## Add glob and regex support for Lister
+
+We can add `glob` and `regex` support for `Lister` to make it more powerful.
diff --git a/core/src/docs/rfcs/mod.rs b/core/src/docs/rfcs/mod.rs
index 8f7c2be50..e5e58f4a8 100644
--- a/core/src/docs/rfcs/mod.rs
+++ b/core/src/docs/rfcs/mod.rs
@@ -151,3 +151,6 @@ pub mod rfc_2758_merge_append_into_write {}
 
 #[doc = include_str!("2774_lister_api.md")]
 pub mod rfc_2774_lister_api {}
+
+#[doc = include_str!("2779_list_with_metakey.md")]
+pub mod rfc_2779_list_with_metakey {}

Reply via email to