[ 
https://issues.apache.org/jira/browse/OAK-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek updated OAK-6922:
-------------------------------
    Comment: was deleted

(was: https://github.com/trekawek/jackrabbit-oak/tree/segment-tar/hdfs)

> Azure support for the segment-tar
> ---------------------------------
>
>                 Key: OAK-6922
>                 URL: https://issues.apache.org/jira/browse/OAK-6922
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Tomek Rękawek
>            Priority: Major
>             Fix For: 1.9.0, 1.10
>
>         Attachments: OAK-6922.patch
>
>
> An Azure Blob Storage implementation of the segment storage, based on the 
> OAK-6921 work.
> h3. Segment files layout
> Thew new implementation doesn't use tar files. They are replaced with 
> directories, storing segments, named after their UUIDs. This approach has 
> following advantages:
> * no need to call seek(), which may be expensive on a remote file system. 
> Rather than that we can read the whole file (=segment) at once.
> * it's possible to send multiple segments at once, asynchronously, which 
> reduces the performance overhead (see below).
> The file structure is as follows:
> {noformat}
> $ hdfs dfs -ls /oak/data00000a.tar
> Found 517 items
> -rw-r--r--   1 rekawek supergroup        192 2017-11-08 13:24 
> /oak/data00000a.tar/0000.b1d032cd-266d-4fd6-acf4-9828f54e2b40
> -rw-r--r--   1 rekawek supergroup     262112 2017-11-08 13:24 
> /oak/data00000a.tar/0001.445ca696-d5d1-4843-a04b-044f84d93663
> -rw-r--r--   1 rekawek supergroup     262144 2017-11-08 13:24 
> /oak/data00000a.tar/0002.91ce6f93-d7ed-4d34-a383-3c3d2eea2acb
> -rw-r--r--   1 rekawek supergroup     262144 2017-11-08 13:24 
> /oak/data00000a.tar/0003.43c09e6f-3d62-4747-ac75-de41b850862a
> (...)
> -rw-r--r--   1 rekawek supergroup     191888 2017-11-08 13:32 
> /oak/data00000a.tar/data00000a.tar.brf
> -rw-r--r--   1 rekawek supergroup     823436 2017-11-08 13:32 
> /oak/data00000a.tar/data00000a.tar.gph
> -rw-r--r--   1 rekawek supergroup      17408 2017-11-08 13:32 
> /oak/data00000a.tar/data00000a.tar.idx
> {noformat}
> For the segment files, each name is prefixed with the index number. This 
> allows to maintain an order, as in the tar archive. This order is normally 
> stored in the index files as well, but if it's missing, the recovery process 
> needs it.
> Each file contains the raw segment data, with no padding/headers. Apart from 
> the segment files, there are 3 special files: binary references (.brf), 
> segment graph (.gph) and segment index (.idx).
> h3. Asynchronous writes
> Normally, all the TarWriter writes are synchronous, appending the segments to 
> the tar file. In case of Azure Blob Storage each write involves a network 
> latency. That's why the SegmentWriteQueue was introduced. The segments are 
> added to the blocking dequeue, which is served by a number of the consumer 
> threads, writing the segments to the cloud. There's also a map UUID->Segment, 
> which allows to return the segments in case they are requested by the 
> readSegment() method before they are actually persisted. Segments are removed 
> from the map only after a successful write operation.
> The flush() method blocks accepting the new segments and returns after all 
> waiting segments are written. The close() method waits until the current 
> operations are finished and stops all threads.
> The asynchronous mode can be disabled by setting the number of threads to 0.
> h5. Queue recovery mode
> If the Azure Blob Storage write() operation fails, the segment will be 
> re-added and the queue is switched to an "recovery mode". In this mode, all 
> the threads are suspended and new segments are not accepted (active waiting). 
> There's a single thread which retries adding the segment with some delay. If 
> the segment is successfully written, the queue will back to the normal 
> operation.
> This way the unavailable remote service is not flooded by the requests and 
> we're not accepting the segments when we can't persist them.
> The close() method finishes the recovery mode - in this case, some of the 
> awaiting segments won't be persisted.
> h5. Consistency
> The asynchronous mode isn't as reliable as the standard, synchronous case. 
> Following cases are possible:
> * TarWriter#writeEntry() returns successfully, but the segments are not 
> persisted.
> * TarWriter#writeEntry() accepts a number of segments: S1, S2, S3. The S2 and 
> S3 are persisted, but the S1 is not.
> On the other hand:
> * If the TarWriter#flush() returns successfully, it means that all the 
> accepted segments has been persisted.
> h5. Recovery
> During the segment recovery (eg. if the index file is missing), the HDFS 
> implementation checks if there's no missing segment in the middle. If so, 
> only the consecutive segments are recovered. For instance, if we have S1, S2, 
> S3, S5, S6, S7, then the recovery process will return only the first three.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to