[ https://issues.apache.org/jira/browse/OAK-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402371#comment-16402371 ]
Tomek Rękawek commented on OAK-6922: ------------------------------------ [~frm], [~mduerig] - thanks for the comments. I moved all the required interfaces to SPI packages and applied all the suggestions (I think). Since the changes in the oak-segment-tar are now quite extensive, I created a separate issue to cover the SPI updated: OAK-7355. See the issue for the patch and the summary of changes. The [^OAK-6922-3.patch] now only contains the new Azure bundle, it requires the OAK-7355 to work. > Azure support for the segment-tar > --------------------------------- > > Key: OAK-6922 > URL: https://issues.apache.org/jira/browse/OAK-6922 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar > Reporter: Tomek Rękawek > Assignee: Tomek Rękawek > Priority: Major > Fix For: 1.9.0, 1.10 > > Attachments: OAK-6922-2.patch, OAK-6922-3.patch, OAK-6922.patch > > > An Azure Blob Storage implementation of the segment storage, based on the > OAK-6921 work. > h3. Segment files layout > Thew new implementation doesn't use tar files. They are replaced with > directories, storing segments, named after their UUIDs. This approach has > following advantages: > * no need to call seek(), which may be expensive on a remote file system. > Rather than that we can read the whole file (=segment) at once. > * it's possible to send multiple segments at once, asynchronously, which > reduces the performance overhead (see below). > The file structure is as follows: > {noformat} > [~]$ az storage blob list -c oak --output table > Name Blob Type > Blob Tier Length Content Type Last Modified > ------------------------------------------------------------ ----------- > ----------- -------- ------------------------ ------------------------- > oak/data00000a.tar/0000.ca1326d1-edf4-4d53-aef0-0f14a6d05b63 BlockBlob > 192 application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data00000a.tar/0001.c6e03426-db9d-4315-a20a-12559e6aee54 BlockBlob > 262144 application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data00000a.tar/0002.b3784e27-6d16-4f80-afc1-6f3703f6bdb9 BlockBlob > 262144 application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data00000a.tar/0003.5d2f9588-0c92-4547-abf7-0263ee7c37bb BlockBlob > 259216 application/octet-stream 2018-01-31T10:59:14+00:00 > ... > oak/data00000a.tar/006e.7b8cf63d-849a-4120-aa7c-47c3dde25e48 BlockBlob > 4368 application/octet-stream 2018-01-31T12:01:09+00:00 > oak/data00000a.tar/006f.93799ae9-288e-4b32-afc2-bbc676fad7e5 BlockBlob > 3792 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data00000a.tar/0070.8b2d5ff2-6a74-4ac3-a3cc-cc439367c2aa BlockBlob > 3680 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data00000a.tar/0071.2a1c49f0-ce33-4777-a042-8aa8a704d202 BlockBlob > 7760 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/journal.log.001 AppendBlob > 1010 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/manifest BlockBlob > 46 application/octet-stream 2018-01-31T10:59:14+00:00 > oak/repo.lock BlockBlob > application/octet-stream 2018-01-31T10:59:14+00:00 > {noformat} > For the segment files, each name is prefixed with the index number. This > allows to maintain an order, as in the tar archive. This order is normally > stored in the index files as well, but if it's missing, the recovery process > uses the prefixes to maintain it. > Each file contains the raw segment data, with no padding/headers. Apart from > the segment files, there are 3 special files: binary references (.brf), > segment graph (.gph) and segment index (.idx). > h3. Asynchronous writes > Normally, all the TarWriter writes are synchronous, appending the segments to > the tar file. In case of Azure Blob Storage each write involves a network > latency. That's why the SegmentWriteQueue was introduced. The segments are > added to the blocking dequeue, which is served by a number of the consumer > threads, writing the segments to the cloud. There's also a map UUID->Segment, > which allows to return the segments in case they are requested by the > readSegment() method before they are actually persisted. Segments are removed > from the map only after a successful write operation. > The flush() method blocks accepting the new segments and returns after all > waiting segments are written. The close() method waits until the current > operations are finished and stops all threads. > The asynchronous mode can be disabled by setting the number of threads to 0. > h5. Queue recovery mode > If the Azure Blob Storage write() operation fails, the segment will be > re-added and the queue is switched to an "recovery mode". In this mode, all > the threads are suspended and new segments are not accepted (active waiting). > There's a single thread which retries adding the segment with some delay. If > the segment is successfully written, the queue will back to the normal > operation. > This way the unavailable remote service is not flooded by the requests and > we're not accepting the segments when we can't persist them. > The close() method finishes the recovery mode - in this case, some of the > awaiting segments won't be persisted. > h5. Consistency > The asynchronous mode isn't as reliable as the standard, synchronous case. > Following cases are possible: > * TarWriter#writeEntry() returns successfully, but the segments are not > persisted. > * TarWriter#writeEntry() accepts a number of segments: S1, S2, S3. The S2 and > S3 are persisted, but the S1 is not. > On the other hand: > * If the TarWriter#flush() returns successfully, it means that all the > accepted segments has been persisted. > h5. Recovery > During the segment recovery (eg. if the index file is missing), the Azure > implementation checks if there's no missing segment in the middle. If so, > only the consecutive segments are recovered. For instance, if we have S1, S2, > S3, S5, S6, S7, then the recovery process will return only the first three. > Patch: > * [OAK-6922.patch|https://github.com/trekawek/jackrabbit-oak/pull/10.diff] > * [GitHub diff|https://github.com/trekawek/jackrabbit-oak/pull/10/files] > * [branch|https://github.com/trekawek/jackrabbit-oak/tree/OAK-6922] -- This message was sent by Atlassian JIRA (v7.6.3#76005)