m09526 opened a new issue, #380: URL: https://github.com/apache/arrow-rs-object-store/issues/380
**Which part is this question about** We have developed two `ObjectStore` wrapper implementations in a similar style to `LimitStore`, `PrefixStore`, etc. which we would like to contribute to this repository if they are desired. Based on the description below, would there be enough of a use-case for the creation of two PRs for the implementations? **Describe your question** The two implementations are as follows: *** `LoggingObjectStore` Our implementation, which a PR would be based off is [here](https://github.com/gchq/sleeper/blob/3ffa40ba33172365e78fe138c7d4f477550848f0/rust/compaction/src/store.rs#L72). `LoggingObjectStore` Wraps another `ObjectStore` implementation and writes all operations "GETs", "PUTs", "LIST", etc, to Rust's standard logger. This was extremely helpful when debugging an application we had written that requested many files from Amazon S3 in small chunks. It was helpful to be able to see exactly when, where and what file parts were being requested by our application to gain a better understanding of what was going on deep inside some library code. *** `ReadaheadStore` Our implementation, which a PR would be based off is [here](https://github.com/gchq/sleeper/blob/develop/rust/compaction/src/readahead.rs). This wraps another `ObjectStore` and attempts to re-use opened data streams on an object during GET operations. Inspired by the functionality in [Apache Hadoop's Hadoop-AWS module](https://hadoop.apache.org/docs/r3.4.1/hadoop-aws/tools/hadoop-aws/index.html), the readahead store keeps a data stream open once a client has finished reading from it. When a new GET operation is requested on the same object, if the starting read position is within a configurable distance of the last read, we can re-use the existing stream instead of opening a new stream. This can drastically reduce the number of new GET operations that have to be started against the wrapped object store in sequential reads of objects. This can have a performance improvement due to fewer network requests being created. If a new GET operation is requested that is before the position of a previous stream, or too far beyond the previous position, a new request is performed on the underlying store. The number of concurrent open streams per object, maximum time-to-live and maximum safe "readahead" are configurable. **Additional context** We would like to contribute these two implementations to the wider community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
