[
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-4294:
------------------------------
External issue URL: https://github.com/apache/arrow/issues/20867
> [Plasma] Add support for evicting objects to external store
> -----------------------------------------------------------
>
> Key: ARROW-4294
> URL: https://issues.apache.org/jira/browse/ARROW-4294
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, C++ - Plasma
> Affects Versions: 0.11.1
> Reporter: Anurag Khandelwal
> Assignee: Anurag Khandelwal
> Priority: Minor
> Labels: features, pull-request-available
> Fix For: 0.13.0
>
> Time Spent: 8h 20m
> Remaining Estimate: 0h
>
> Currently, when Plasma needs storage space for additional objects, it evicts
> objects by deleting them from the Plasma store. This is a problem when it
> isn't possible to reconstruct the object or reconstructing it is expensive.
> Adding support for a pluggable external store that Plasma can evict objects
> to will address this issue.
> My proposal is described below.
> *Requirements*
> * Objects in Plasma should be evicted to a external store rather than being
> removed altogether
> * Communication to the external storage service should be through a very
> thin, shim interface. At the same time, the interface should be general
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
> * Should be pluggable (e.g., it should be simple to add in or remove the
> external storage service for eviction, switch between different remote
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
> * The external store has practically infinite storage
> * The external store's write operation is idempotent and atomic; this is
> needed ensure there are no race conditions due to multiple concurrent
> evictions of the same object.
> *Proposed Implementation*
> * Define a ExternalStore interface with a Connect call. The call returns an
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that
> needs to be supported has to have this interface implemented.
> * In order to read or write data to the external store in a thread-safe
> manner, one ExternalStoreHandle should be created per-thread. While the
> ExternalStoreHandle itself is not required to be thread-safe, multiple
> ExternalStoreHandles across multiple threads should be able to modify the
> external store in a thread-safe manner. These handles are most likely going
> to be wrappers around the external store client interfaces.
> * Replace the DeleteObjects method in the Plasma Store with an EvictObjects
> method. If an external store is specified for the Plasma store, the
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the
> object data to the external store (via the ExternalStoreHandle) and reclaim
> the memory associated with the object data/metadata rather than remove the
> entry from the Object Table altogether. In case there is no valid external
> store, the eviction path would remain the same (i.e., the object entry is
> still deleted from the Object Table).
> * The Get method in Plasma Store now tries to fetch the object from external
> store if it is not found locally and there is an external store associated
> with the Plasma Store. The method tries to offload this to an external worker
> thread pool with a fire-and-forget model, but may need to do this
> synchronously if there are too many requests already enqueued.
> * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES,
> which can be appended to with implementations of the ExternalStore and
> ExternalStoreHandle interfaces, which will then be compiled into the
> plasma_store_server executable.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)