[ https://issues.apache.org/jira/browse/OAK-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tomek Rękawek updated OAK-6921: ------------------------------- Attachment: (was: OAK-6921.patch) > Support pluggable segment storage > --------------------------------- > > Key: OAK-6921 > URL: https://issues.apache.org/jira/browse/OAK-6921 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar > Reporter: Tomek Rękawek > Fix For: 1.9.0 > > Attachments: OAK-6921.patch > > > h3. Rationale > segment-tar, as names suggest, stores the segments in a bunch of tar > archives, inside the {{segmentstore}} directory on the local file system. For > some cases, especially in the cloud deployments, it may be interesting to > store the segments outside the local FS - the remote storage such as Amazon > S3, Azure Blob Storage or HDFS may be cheaper than a mounted disk, more > scalable, easier for the provisioning, etc. > h3. Current state > There are 3 classes responsible for handling tar files in the segment-tar: > TarFiles, TarWriter and TarReader. The TarFiles manages the {{segmentstore}} > directory, scans it for the .tars and for each one creates a TarReader. It > also creates a single TarWriter object, used to write (and also read) the > most recent tar file. > The TarWriter appends segments to the latest tar file and also serializes the > auxiliary indexes: segment index, binary references index and the segment > graph. It also takes of synchronization, as we're dealing with a mutable data > structure - tar file opened in the append mode. > The TarReader not only reads the segments from the tar file, but is also > responsible for the revision GC (mark & sweep methods) and recovering data > from files which hasn't been closed cleanly (eg. have no index). > h3. New abstraction layer > In order to store segments not in the tar files, but somewhere else, it'd be > possible to create own implementation of the TarFiles, TarWriter and > TarReader. However, such implementation would duplicate a lot of code, not > strictly related to the persistence - mark(), sweep(), synchronization, etc. > Rather than that, the attached patch presents a different approach: a new > layer of abstraction is injected into TarFiles, TarWriter and TarReader - it > only takes care of the segments persistence and knows nothing about the > synchronization, GC, etc. - leaving it to the upper layer. > The new abstraction layer is modelled using 3 new classes: > SegmentArchiveManager, SegmentArchiveReader and SegmentArchiveWriter. They > are strictly related to the existing Tar* classes and used by them. > SegmentArchiveManager provides a bunch of file system-style methods, like > open(), create(), delete(), exists(), etc. The open() and create() returns > instances of the SAReader and SAWriter. > SegmentArchiveReader, despite from reading segments, can also load and parse > the index, graph and binary references. The logic responsible for parsing > these structures has been already extracted, so it doesn't need to be > duplicated in the SAReader implementations. Also, SAReader needs to be aware > about the index, since it contains the segment offsets. > The SAWriter class allows to write and read the segments and also store the > indexes. It isn't thread safe - it assumes that the synchronization is > already done on the higher layers. > In the patch, I've moved the tar implementation to the new classes: > SegmentTarManager, SegmentTarReader and SegmentTarWriter. > h3. TODO > * The names and package locations for all the affected classes are subjects > to change - after applying the patch the TarFiles doesn't deal with the .tar > files anymore, similarly the TarReader and TarWriter delegates the low-level > file access duties to the SegmentArchiveReader and Writer. I didn't want to > change the names yet, to make it easier to understand and rebase the patch > with the trunk changes. > * Add JUnit documentation to the new interfaces. > * SegmentNodeStoreService should allow to get the SegmentArchiveManager > service from the OSGi (so the implementations can be added in other bundles). -- This message was sent by Atlassian JIRA (v6.4.14#64029)