This is an automated email from the ASF dual-hosted git repository.
xuanwo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-opendal.git
The following commit(s) were added to refs/heads/main by this push:
new 63cfcf050 RFC-2779: List With Metakey (#2779)
63cfcf050 is described below
commit 63cfcf05081ead1aacd96656f8e0a216d702fed9
Author: Xuanwo <[email protected]>
AuthorDate: Mon Aug 7 14:08:57 2023 +0800
RFC-2779: List With Metakey (#2779)
* rfc: List With Metakey
Signed-off-by: Xuanwo <[email protected]>
* Rename
Signed-off-by: Xuanwo <[email protected]>
* assign number
Signed-off-by: Xuanwo <[email protected]>
* Fix typo
Signed-off-by: Xuanwo <[email protected]>
* Fix date
Signed-off-by: Xuanwo <[email protected]>
* Fix summary
Signed-off-by: Xuanwo <[email protected]>
* Fix
Signed-off-by: Xuanwo <[email protected]>
* Fix
Signed-off-by: Xuanwo <[email protected]>
* Add more context on how metakey works
Signed-off-by: Xuanwo <[email protected]>
---------
Signed-off-by: Xuanwo <[email protected]>
---
core/src/docs/rfcs/2779_list_with_metakey.md | 143 +++++++++++++++++++++++++++
core/src/docs/rfcs/mod.rs | 3 +
2 files changed, 146 insertions(+)
diff --git a/core/src/docs/rfcs/2779_list_with_metakey.md
b/core/src/docs/rfcs/2779_list_with_metakey.md
new file mode 100644
index 000000000..783e3cb5b
--- /dev/null
+++ b/core/src/docs/rfcs/2779_list_with_metakey.md
@@ -0,0 +1,143 @@
+- Proposal Name: `list_with_metakey`
+- Start Date: 2023-08-04
+- RFC PR:
[apache/incubator-opendal#2779](https://github.com/apache/incubator-opendal/pull/2779)
+- Tracking Issue:
[apache/incubator-opendal#2802](https://github.com/apache/incubator-opendal/issues/2802)
+
+# Summary
+
+Move `Operator` `metadata` API to `list_with().metakey()` to simplify the
usage.
+
+# Motivation
+
+The current `Entry` metadata API is:
+
+```rust
+use opendal::Entry;
+use opendal::Metakey;
+
+let meta = op
+ .metadata(&entry, Metakey::ContentLength | Metakey::ContentType)
+ .await?;
+```
+
+This API is difficult to understand and rarely used correctly. And in reality,
users always fetch the same set of metadata during listing.
+
+Take one of our users code as an example:
+
+```rust
+let stream = self
+ .inner
+ .scan(&path)
+ .await
+ .map_err(|err| format_object_store_error(err, &path))?;
+
+let stream = stream.then(|res| async {
+ let entry = res.map_err(|err| format_object_store_error(err, ""))?;
+ let meta = self
+ .inner
+ .metadata(&entry, Metakey::ContentLength | Metakey::LastModified)
+ .await
+ .map_err(|err| format_object_store_error(err, entry.path()))?;
+
+ Ok(format_object_meta(entry.path(), &meta))
+});
+
+Ok(stream.boxed())
+```
+
+By moving metadata to `lister`, our user code can be simplified to:
+
+```rust
+let stream = self
+ .inner
+ .scan_with(&path)
+ .metakey(Metakey::ContentLength | Metakey::LastModified)
+ .await
+ .map_err(|err| format_object_store_error(err, &path))?;
+
+let stream = stream.then(|res| async {
+ let entry = res.map_err(|err| format_object_store_error(err, ""))?;
+ let meta = entry.into_metadata()
+
+ Ok(format_object_meta(entry.path(), &meta))
+});
+
+Ok(stream.boxed())
+```
+
+By introducing this change:
+
+- Users don't need to capture `Operator` in the closure.
+- Users don't need to do async call like `metadata()` again.
+
+If we don't have this change:
+
+- every place that could receive a `fn()` must use `Fn()` instead which
enforce users to have a generic parameter in their code.
+- It's harder for other languages binding to implement `op.metadata()` right.
+
+# Guide-level explanation
+
+The new API will be:
+
+```rust
+let entries: Vec<Entry> = op
+ .list_with("dir")
+ .metakey(Metakey::ContentLength | Metakey::ContentType).await?;
+
+let meta: &Metadata = entries[0].metadata();
+```
+
+Metadata can be queried directly when listing entries via `metadata()`, and
later extracted via `into_parts()`.
+
+# Reference-level explanation
+
+## How metakey works
+
+For every services, `stat` will return the full set of it's metadata. For
example, `s3` will return `ContentLength | ContentType | LastModified | ...`,
and `fs` will return `ContentLength | LastModified`. And most services will
return part of those metadata during `list`. `s3` will return `ContentLength`,
`LastModified`, but `fs` returns none of them.
+
+So when users use `list` to list entries, they will get a list of entries with
incomplete metadata. The metadata could be in three states:
+
+- Filled: the metadata is returned in `list`
+- NotExist: the metadata is not supported by service.
+- Unknown: the metadata is supported by service but not returned in `list`.
+
+By accept `metakey`, we can compare the returning entry's metadata with
metakey:
+
+- Return the entry if metakey already met by `Filled` and `NotExist`.
+- Send `stat` call to fetch the metadata if metadata is `Unknown`.
+
+## Changes
+
+We will add `metakey` into `OpList`. Underlying services can use those
information to try their best to fetch the metadata.
+
+There are following possibilities:
+
+- The entry metadata is met: `Lister` return the entry directly
+- The entry metadata is not met and not fully filled: `Lister` will try to
send `stat` call to fetch the metadata
+- The entry metadata is not met and fully filled: `Lister` will return the
entry directly.
+
+To make sure we can handle all metadata correctly, we will add a new
capability called `stat_complete_metakey`. This capability will be used to
indicate the complete set of metadata that can be fetched via `stat` call. For
example, `s3` can set this capability to `ContentLength | ContentType |
LastModified | ...`, and `fs` only have `ContentLength | LastModified`.
`Lister` can use this capability to decide whether to send `stat` call or not.
+
+Services' lister implementation should not changed.
+
+# Drawbacks
+
+None
+
+# Rationale and alternatives
+
+Keeping the complex standalone API has limited benefit given low usage.
+
+# Prior art
+
+None
+
+# Unresolved questions
+
+None
+
+# Future possibilities
+
+## Add glob and regex support for Lister
+
+We can add `glob` and `regex` support for `Lister` to make it more powerful.
diff --git a/core/src/docs/rfcs/mod.rs b/core/src/docs/rfcs/mod.rs
index 8f7c2be50..e5e58f4a8 100644
--- a/core/src/docs/rfcs/mod.rs
+++ b/core/src/docs/rfcs/mod.rs
@@ -151,3 +151,6 @@ pub mod rfc_2758_merge_append_into_write {}
#[doc = include_str!("2774_lister_api.md")]
pub mod rfc_2774_lister_api {}
+
+#[doc = include_str!("2779_list_with_metakey.md")]
+pub mod rfc_2779_list_with_metakey {}