[I] [Question] Would contribution of logging wrapper and a "readahead" wrapper for ObjectStore be wanted? [arrow-rs-object-store]

via GitHub Thu, 22 May 2025 08:25:22 -0700


m09526 opened a new issue, #380:
URL: https://github.com/apache/arrow-rs-object-store/issues/380

**Which part is this question about**

We have developed two `ObjectStore` wrapper implementations in a similar
style to `LimitStore`, `PrefixStore`, etc. which we would like to contribute to
this repository if they are desired. Based on the description below, would
there be enough of a use-case for the creation of two PRs for the
implementations?

**Describe your question**
The two implementations are as follows:

*** `LoggingObjectStore`
Our implementation, which a PR would be based off is
[here](https://github.com/gchq/sleeper/blob/3ffa40ba33172365e78fe138c7d4f477550848f0/rust/compaction/src/store.rs#L72).

`LoggingObjectStore` Wraps another `ObjectStore` implementation and writes
all operations "GETs", "PUTs", "LIST", etc, to Rust's standard logger.

This was extremely helpful when debugging an application we had written that
requested many files from Amazon S3 in small chunks. It was helpful to be able
to see exactly when, where and what file parts were being requested by our
application to gain a better understanding of what was going on deep inside
some library code.

*** `ReadaheadStore`
Our implementation, which a PR would be based off is
[here](https://github.com/gchq/sleeper/blob/develop/rust/compaction/src/readahead.rs).

This wraps another `ObjectStore` and attempts to re-use opened data streams
on an object during GET operations. Inspired by the functionality in [Apache
Hadoop's Hadoop-AWS
module](https://hadoop.apache.org/docs/r3.4.1/hadoop-aws/tools/hadoop-aws/index.html),
the readahead store keeps a data stream open once a client has finished
reading from it. When a new GET operation is requested on the same object, if
the starting read position is within a configurable distance of the last read,
we can re-use the existing stream instead of opening a new stream.

This can drastically reduce the number of new GET operations that have to be
started against the wrapped object store in sequential reads of objects. This
can have a performance improvement due to fewer network requests being created.

If a new GET operation is requested that is before the position of a
previous stream, or too far beyond the previous position, a new request is
performed on the underlying store. The number of concurrent open streams per
object, maximum time-to-live and maximum safe "readahead" are configurable.

**Additional context**
We would like to contribute these two implementations to the wider community.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Question] Would contribution of logging wrapper and a "readahead" wrapper for ObjectStore be wanted? [arrow-rs-object-store]

Reply via email to