[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Khandelwal updated ARROW-4294:
-
Component/s: Plasma (C++)

> [Plasma] Add support for evicting objects to external store
> ---
>
> Key: ARROW-4294
> URL: https://issues.apache.org/jira/browse/ARROW-4294
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Plasma (C++)
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when Plasma needs storage space for additional objects, it evicts 
> objects by deleting them from the Plasma store. This is a problem when it 
> isn't possible to reconstruct the object or reconstructing it is expensive. 
> Adding support for a pluggable external store that Plasma can evict objects 
> to will address this issue. 
> My proposal is described below.
> *Requirements*
>  * Objects in Plasma should be evicted to a external store rather than being 
> removed altogether
>  * Communication to the external storage service should be through a very 
> thin, shim interface. At the same time, the interface should be general 
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
>  * Should be pluggable (e.g., it should be simple to add in or remove the 
> external storage service for eviction, switch between different remote 
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
>  * The external store has practically infinite storage
>  * The external store's write operation is idempotent and atomic; this is 
> needed ensure there are no race conditions due to multiple concurrent 
> evictions of the same object.
> *Proposed Implementation*
>  * Define a ExternalStore interface with a Connect call. The call returns an 
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
> needs to be supported has to have this interface implemented.
>  * In order to read or write data to the external store in a thread-safe 
> manner, one ExternalStoreHandle should be created per-thread. While the 
> ExternalStoreHandle itself is not required to be thread-safe, multiple 
> ExternalStoreHandles across multiple threads should be able to modify the 
> external store in a thread-safe manner. These handles are most likely going 
> to be wrappers around the external store client interfaces.
>  * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
> method. If an external store is specified for the Plasma store, the 
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
> object data to the external store (via the ExternalStoreHandle) and reclaim 
> the memory associated with the object data/metadata rather than remove the 
> entry from the Object Table altogether. In case there is no valid external 
> store, the eviction path would remain the same (i.e., the object entry is 
> still deleted from the Object Table).
>  * The Get method in Plasma Store now tries to fetch the object from external 
> store if it is not found locally and there is an external store associated 
> with the Plasma Store. The method tries to offload this to an external worker 
> thread pool with a fire-and-forget model, but may need to do this 
> synchronously if there are too many requests already enqueued.
>  * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, 
> which can be appended to with implementations of the ExternalStore and 
> ExternalStoreHandle interfaces, which will then be compiled into the 
> plasma_store_server executable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4294:
--
Labels: features pull-request-available  (was: features)

> [Plasma] Add support for evicting objects to external store
> ---
>
> Key: ARROW-4294
> URL: https://issues.apache.org/jira/browse/ARROW-4294
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Anurag Khandelwal
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.13.0
>
>
> Currently, when Plasma needs storage space for additional objects, it evicts 
> objects by deleting them from the Plasma store. This is a problem when it 
> isn't possible to reconstruct the object or reconstructing it is expensive. 
> Adding support for a pluggable external store that Plasma can evict objects 
> to will address this issue. 
> My proposal is described below.
> *Requirements*
>  * Objects in Plasma should be evicted to a external store rather than being 
> removed altogether
>  * Communication to the external storage service should be through a very 
> thin, shim interface. At the same time, the interface should be general 
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
>  * Should be pluggable (e.g., it should be simple to add in or remove the 
> external storage service for eviction, switch between different remote 
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
>  * The external store has practically infinite storage
>  * The external store's write operation is idempotent and atomic; this is 
> needed ensure there are no race conditions due to multiple concurrent 
> evictions of the same object.
> *Proposed Implementation*
>  * Define a ExternalStore interface with a Connect call. The call returns an 
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
> needs to be supported has to have this interface implemented.
>  * In order to read or write data to the external store in a thread-safe 
> manner, one ExternalStoreHandle should be created per-thread. While the 
> ExternalStoreHandle itself is not required to be thread-safe, multiple 
> ExternalStoreHandles across multiple threads should be able to modify the 
> external store in a thread-safe manner. These handles are most likely going 
> to be wrappers around the external store client interfaces.
>  * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
> method. If an external store is specified for the Plasma store, the 
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
> object data to the external store (via the ExternalStoreHandle) and reclaim 
> the memory associated with the object data/metadata rather than remove the 
> entry from the Object Table altogether. In case there is no valid external 
> store, the eviction path would remain the same (i.e., the object entry is 
> still deleted from the Object Table).
>  * The Get method in Plasma Store now tries to fetch the object from external 
> store if it is not found locally and there is an external store associated 
> with the Plasma Store. The method tries to offload this to an external worker 
> thread pool with a fire-and-forget model, but may need to do this 
> synchronously if there are too many requests already enqueued.
>  * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, 
> which can be appended to with implementations of the ExternalStore and 
> ExternalStoreHandle interfaces, which will then be compiled into the 
> plasma_store_server executable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Khandelwal updated ARROW-4294:
-
Description: 
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner. These handles are most likely going to 
be wrappers around the external store client interfaces.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specified for the Plasma store, the 
EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
object data to the external store (via the ExternalStoreHandle) and reclaim the 
memory associated with the object data/metadata rather than remove the entry 
from the Object Table altogether. In case there is no valid external store, the 
eviction path would remain the same (i.e., the object entry is still deleted 
from the Object Table).
 * The Get method in Plasma Store now tries to fetch the object from external 
store if it is not found locally and there is an external store associated with 
the Plasma Store. The method tries to offload this to an external worker thread 
pool with a fire-and-forget model, but may need to do this synchronously if 
there are too many requests already enqueued.
 * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which 
can be appended to with implementations of the ExternalStore and 
ExternalStoreHandle interfaces, which will then be compiled into the 
plasma_store_server executable.

 

  was:
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the 

[jira] [Updated] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Khandelwal updated ARROW-4294:
-
Description: 
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specified for the Plasma store, the 
EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
object data to the external store (via the ExternalStoreHandle) and reclaim the 
memory associated with the object data/metadata rather than remove the entry 
from the Object Table altogether. In case there is no valid external store, the 
eviction path would remain the same (i.e., the object entry is still deleted 
from the Object Table).
 * The Get method in Plasma Store now tries to fetch the object from external 
store if it is not found locally and there is an external store associated with 
the Plasma Store. The method tries to offload this to an external worker thread 
pool with a fire-and-forget model, but may need to do this synchronously if 
there are too many requests already enqueued.
 * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which 
can be appended to with implementations of the ExternalStore and 
ExternalStoreHandle interfaces, which will then be compiled into the 
plasma_store_server executable.

 

  was:
Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is