m09526 opened a new issue, #380:
URL: https://github.com/apache/arrow-rs-object-store/issues/380

   **Which part is this question about**
   
   We have developed two `ObjectStore` wrapper implementations in a similar 
style to `LimitStore`, `PrefixStore`, etc. which we would like to contribute to 
this repository if they are desired. Based on the description below, would 
there be enough of a use-case for the creation of two PRs for the 
implementations?
   
   **Describe your question**
   The two implementations are as follows:
   
   *** `LoggingObjectStore`
   Our implementation, which a PR would be based off is 
[here](https://github.com/gchq/sleeper/blob/3ffa40ba33172365e78fe138c7d4f477550848f0/rust/compaction/src/store.rs#L72).
   
   `LoggingObjectStore` Wraps another `ObjectStore` implementation and writes 
all operations "GETs", "PUTs", "LIST", etc, to Rust's standard logger.
   
   This was extremely helpful when debugging an application we had written that 
requested many files from Amazon S3 in small chunks. It was helpful to be able 
to see exactly when, where and what file parts were being requested by our 
application to gain a better understanding of what was going on deep inside 
some library code.
   
   
   *** `ReadaheadStore`
   Our implementation, which a PR would be based off is 
[here](https://github.com/gchq/sleeper/blob/develop/rust/compaction/src/readahead.rs).
   
   This wraps another `ObjectStore` and attempts to re-use opened data streams 
on an object during GET operations. Inspired by the functionality in [Apache 
Hadoop's Hadoop-AWS 
module](https://hadoop.apache.org/docs/r3.4.1/hadoop-aws/tools/hadoop-aws/index.html),
 the readahead store keeps a data stream open once a client has finished 
reading from it. When a new GET operation is requested on the same object, if 
the starting read position is within a configurable distance of the last read, 
we can re-use the existing stream instead of opening a new stream.
   
   This can drastically reduce the number of new GET operations that have to be 
started against the wrapped object store in sequential reads of objects. This 
can have a performance improvement due to fewer network requests being created.
   
   If a new GET operation is requested that is before the position of a 
previous stream, or too far beyond the previous position, a new request is 
performed on the underlying store. The number of concurrent open streams per 
object, maximum time-to-live and maximum safe "readahead" are configurable.
   
   **Additional context**
   We would like to contribute these two implementations to the wider community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to