Hi Team, Currently SegmentNodeStore does not uses BlobStore by default and stores the binary data within data tar files. This has the goodness that
1. Backup is simpler - User just needs to backup segmentstore directory 2. No Blob GC - The RevisionGC would also delete the binary content and a separate Blob GC need not be performed 3. Faster IO - The binary content would be fetched via memory mapped files and hence might have better performance compared to streamed io. However of late we are seeing issue where repository is not able to reclaim space from deleted binary content as part of normal cleanup and full scale compaction needs to be performed to reclaim the space. However running compaction has other issue (see OAK-2045) and currently it needs to be performed offline to get optimum results. In quite a few cases it has been see that repository growth is mostly due to Lucene index content changes which leads to creation of new binary content and also causes fragmentation due to newer revisions. Further as Segment logic does not perform de duplication any change in Lucene index file would probably re create the whole index file (need to confirm). Given that such repository growth is troublesome it might be better if we configure a BlobStore by default with SegmentNodeStore (or atleast for applications like AEM). This should reduce the rate of repository growth due to 1. De duplication - BlobStore and DataStore (current impls) implement de duplication so adding same binary would not cause size growth 2. Lesser Fragmentation - As large binary content would not be part of data tar files Blob GC would be able to reclaim space. Currently in a cleanup if even one bulk segment in a data tar is having a reference the cleanup would not be able to remove that. That space can only be reclaimed via compaction. Compared to benefits mentioned initially 1. Backup - User needs to backup two folders 2. Blob GC would need to be run separately 3. Faster IO - That needs to be seen. For Lucene this can be mitigated to an extent with proposed CopyOnReadDirectory support in OAK-1724 Further we also get the benefit of sharing the BlobStore between multiple instances if required!! Thoughts? Chetan Mehrotra