Hi,

I'm looking at the possibility of creating a new kind of data store, let's
call it a federated data store, and wanted to see what everyone thinks
about this.

The basic idea is that the federated data store would allow for more than
one data store to be configured for an Oak instance.  Oak would then be
able to choose which data store to use based on a number of criteria, like
file size, JCR path, node type, existence of a node property, a node
property value, or other items, or a combination of items.  In my thinking
these are defined in configuration so the federated data store would know
how to select which data store is used to store which binary.

I think this is a step towards UC14 - Hierarchical BlobStore in [0].  Once
the federated data store was implemented we should be able to support UC14
with little work.  I can also foresee other possible capabilities it could
offer, such as storing blobs for different node types in different data
stores, or choosing from a few different data stores based on geographic
location (UC2 in [0]).

In my mind we could add capability to DataStoreBlobStore.writeStream()
where the decision is made whether to write a stream to the data store
delegate or put it in-memory.  Instead we could defer the decision directly
to the delegate, adding a method to the appropriate interface (BlobStore or
GarbageCollectibleBlobStore) to handle this decision, and default the
decision in AbstractBlobStore to be based on the record size (which is the
current behavior, except currently that decision is made in
DataStoreBlobStore IIUC).  All other existing data stores should then
behave the same.  But in the case of the federated data store this decision
would be more involved, selecting the right data store based on
configuration.

The federated data store would need to exist independent of other data
stores, so figuring out how to create those data stores without having a
code dependency would be a challenge to figure out.


Please let me know what you think, is my idea about the implementation
flawed, is there a better way to accomplish this, what concerns are there
about it, etc.  I'd like to brainstorm with the list something that can
work in this area and then I'll create a ticket for it.  Or I can create
the ticket, and we can have the discussion in the ticket.  Let me know
which is best.


[0] - https://wiki.apache.org/jackrabbit/JCR%20Binary%20Usecase


- Matt Ryan

Reply via email to