[ https://issues.apache.org/jira/browse/HDDS-15120 ]


    Ivan Andika deleted comment on HDDS-15120:
    ------------------------------------

was (Author: JIRAUSER298977):
Thanks [~peterxcli] for the paper

Came across another system https://github.com/dolthub/dolt 

> Support bucket forks / branches for agentic workload
> ----------------------------------------------------
>
>                 Key: HDDS-15120
>                 URL: https://issues.apache.org/jira/browse/HDDS-15120
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Ivan Andika
>            Assignee: Chu Cheng Li
>            Priority: Major
>
> Currently, Ozone supports bucket snapshot which creates a read-only immutable 
> state of the entire bucket for use cases such as backup, replication, 
> compliance, etc. This is achieved using the RocksDB checkpoint feature which 
> tracks the current SST files at that point.
> With the recent rise agentic workloads, there is a need for storage systems 
> to implement forking / branching to cater for multi-agents workload. Unlike 
> snapshot, forks can be mutated. The idea of forking and branch is similar to 
> Git branch / worktrees where a new "branch" is created based on the base 
> directory. Multiple agents can fork the same base file system in parallel and 
> mutate these forks without affecting each other. These forks should also have 
> zero-copy, similar to snapshot (which should only require O(1) time to 
> create). Additionally, these forks lifetime can varies (it can be retained 
> for a long time or discarded quite quickly).
> Example systems
> * NeonDB branching: https://neon.com/docs/introduction/branching
> * Tigris Object Store: https://www.tigrisdata.com/docs/snapshots-and-forks/ 
> (please see the related blogs on the implementations of forks).
> Ozone can consider supporting this feature. Since more systems implement 
> storage compute separation architecture on object storage, the compute / 
> caching layer can rely on Ozone as the backing store for agentic workloads 
> since Ozone supports snapshot and forking (they don't need to implement 
> snapshot and forking or need to write complicated logic to store their forks 
> state). Ozone can then position itself as the open-source object store / 
> distributed file system for agentic workloads.
> This ticket acts as a way to start a discussion in the community on this 
> direction. We can start thinking about this (and probably try to start 
> prototyping some ideas). This might require a radical change of Ozone Manager 
> design (e.g. might need to introduce versioning, reference counting, 
> copy-on-write, log subsystems, OM deletions semantic change, etc).
> Future scopes
> * Branch expiration: Cleanup branches that have not been used for a long time 
> (user can also specify this)
> * Branch archive: Move all the old branches to lower cost storage (need 
> storage policy).
> Out of scope
> * Merge Ozone fork back to the main bucket



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to