Hi, My suggestion is to _not_ support "resumable" operations on a large tree, but instead don't use large operations. But I wouldn't call my solution "sharding", but more "bit-by-bit reindexing". Some more details: For indexing (specially synchronous property indexes) I suggest to do the following, for both a new index, and for reindexing:
1) Reindexing is writing to a new subtree, that is, ":index_1", ":index_2",... Which one is used is (automatically) set in the index itself at /oak:index/<...>, in a hidden property ":writeNode" 2) Reading (queries) use the old subtree if there is any (":index" right now). Which one it used is (automatically) set in the index itself at /oak:index/<...>, in a hidden property ":liveNode" 3) Synchronous index updates are written to the index node defined at ":liveNode". Therefore, at the same time, reindex to a new subtree and index updates to the old subtree can occur (actually more than that, see below at 6). 4) To track reindexing/indexing progress, there is a "current position" persisted at /oak:index/<...>, in a hidden property ":writePath". This entry is automatically advanced by the asynchronous reindexing thread from "/" to "/a" to "/b/a" to "/content/a/b", "/content/x", "/system/03/01", "/system/04", "/system/0a",... and so on, until the whole repository is indexed. 5) The asynchronous indexing thread reads all nodes of the repository _after_ what is written at ":writePath" and indexes that to the new subtree, and once there are 1000 changes to the index, the last indexed path is written to ":writePath", plus the additions to the index, in one transaction. 6) Synchronous index updates are also written to the index node defined at ":writeNode", if they affect nodes before the ":writePath". 7) After a restart, reindexing continues where it left off, using the last ":writePath". 8) After reindexing is done, ":livePath" is updated to ":index_2", and ":writePath" is removed. Also, the old index subtree is removed (1000 nodes per commit). 9) Sorting of path is needed, so that the repository can be processed bit by bit by bit. For that, the following logic is used, recursively: read at most 1000 child nodes. If there are more than 1000, then this subtree is never split but processed in one step (so many child nodes can still lead to large transactions, unfortunately). If less than 1000 child nodes, then the names of all child nodes are read, and processed in sorted order (sorted by node name). Therefore, all reindexing operations use small transactions. Queries can run concurrently. Reindexing can be paused. Reindexing can continue even after a restart. While reindexing, the old index is maintained and up-to-date. The branch-less commit mode is not needed. No conflicts between the synchronous index updates and asynchronous reindexing thread are possible. Regards, Thomas