[ 
https://issues.apache.org/jira/browse/OAK-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek reassigned OAK-6921:
----------------------------------

    Assignee: Tomek Rękawek

> Support pluggable segment storage
> ---------------------------------
>
>                 Key: OAK-6921
>                 URL: https://issues.apache.org/jira/browse/OAK-6921
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Tomek Rękawek
>            Assignee: Tomek Rękawek
>            Priority: Major
>             Fix For: 1.9.0, 1.10
>
>         Attachments: OAK-6921.patch, current-state.png, new-interfaces.png
>
>
> h3. Rationale
> segment-tar, as names suggest, stores the segments in a bunch of tar 
> archives, inside the {{segmentstore}} directory on the local file system. For 
> some cases, especially in the cloud deployments, it may be interesting to 
> store the segments outside the local FS - the remote storage such as Amazon 
> S3, Azure Blob Storage or HDFS may be cheaper than a mounted disk, more 
> scalable, easier for the provisioning, etc.
> h3. Storing segment in tar files
> !current-state.png!
> There are 3 classes responsible for handling tar files in the segment-tar: 
> TarFiles, TarWriter and TarReader. The TarFiles manages the {{segmentstore}} 
> directory, scans it for the .tars and for each one creates a TarReader. It 
> also creates a single TarWriter object, used to write (and also read) the 
> most recent tar file.
> The TarWriter appends segments to the latest tar file and also serializes the 
> auxiliary indexes: segment index, binary references index and the segment 
> graph. It also takes of synchronization, as we're dealing with a mutable data 
> structure - tar file opened in the append mode.
> The TarReader not only reads the segments from the tar file, but is also 
> responsible for the revision GC (mark & sweep methods) and recovering data 
> from files which hasn't been closed cleanly (eg. have no index).
> h3. New abstraction layer - SegmentArchiveManager
> !new-interfaces.png!
> In order to store segments not in the tar files, but somewhere else, it'd be 
> possible to create own implementation of the TarFiles, TarWriter and 
> TarReader. However, such implementation would duplicate a lot of code, not 
> strictly related to the persistence - mark(), sweep(), synchronization, etc. 
> Rather than that, the attached patch presents a different approach: a new 
> layer of abstraction is injected into TarFiles, TarWriter and TarReader - it 
> only takes care of the segments persistence and knows nothing about the 
> synchronization, GC, etc. - leaving it to the upper layer.
> The new abstraction layer is modelled using 3 new classes: 
> SegmentArchiveManager, SegmentArchiveReader and SegmentArchiveWriter. They 
> are strictly related to the existing Tar* classes and used by them.
> SegmentArchiveManager provides a bunch of file system-style methods, like 
> open(), create(), delete(), exists(), etc. The open() and create() returns 
> instances of the SAReader and SAWriter.
> SegmentArchiveReader, despite from reading segments, can also load and parse 
> the index, graph and binary references. The logic responsible for parsing 
> these structures has been already extracted, so it doesn't need to be 
> duplicated in the SAReader implementations. Also, SAReader needs to be aware 
> about the index, since it contains the segment offsets.
> The SAWriter class allows to write and read the segments and also store the 
> indexes. It isn't thread safe - it assumes that the synchronization is 
> already done on the higher layers.
> In the patch, I've moved the tar implementation to the new classes: 
> SegmentTarManager, SegmentTarReader and SegmentTarWriter.
> h3. Other files
> Apart from the segments, the {{segmentstore}} directory also contains 
> following files:
> * repo.lock
> * journal.log
> * gc.log
> * manifest
> All these files are supported by the new SegmentNodeStorePersistence. Usually 
> there's a simple interface (RepositoryLock, JournalLogFile, etc.) for 
> handling the files.
> h3. TODO
> * The names and package locations for all the affected classes are subjects 
> to change - after applying the patch the TarFiles doesn't deal with the .tar 
> files anymore, similarly the TarReader and TarWriter delegates the low-level 
> file access duties to the SegmentArchiveReader and Writer. I didn't want to 
> change the names yet, to make it easier to understand and rebase the patch 
> with the trunk changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to