[ 
https://issues.apache.org/jira/browse/OAK-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek updated OAK-6922:
-------------------------------
    Summary: Azure support for the segment-tar  (was: HDFS support for the 
segment-tar)

> Azure support for the segment-tar
> ---------------------------------
>
>                 Key: OAK-6922
>                 URL: https://issues.apache.org/jira/browse/OAK-6922
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Tomek Rękawek
>            Priority: Major
>             Fix For: 1.9.0, 1.10
>
>         Attachments: OAK-6922.patch
>
>
> A HDFS implementation of the segment storage, based on the OAK-6921 work.
> h3. HDFS
> HDFS is a distributed, network file system. The most popular implementation 
> is Apache Hadoop, but the HDFS is also available in the Amazon AWS and 
> Microsoft Azure clouds. Despite being a file system, it requires a custom API 
> to be used - it's not similar to NFS or CIFS which can be simply mounted 
> locally.
> h3. Segment files layout
> Thew new implementation doesn't use tar files. They are replaced with 
> directories, storing segments, named after their UUIDs. This approach has 
> following advantages:
> * no need to call seek(), which may be expensive on a remote file system. 
> Rather than that we can read the whole file (=segment) at once.
> * it's possible to send multiple segments at once, asynchronously, which 
> reduces the performance overhead (see below).
> The file structure is as follows:
> {noformat}
> $ hdfs dfs -ls /oak/data00000a.tar
> Found 517 items
> -rw-r--r--   1 rekawek supergroup        192 2017-11-08 13:24 
> /oak/data00000a.tar/0000.b1d032cd-266d-4fd6-acf4-9828f54e2b40
> -rw-r--r--   1 rekawek supergroup     262112 2017-11-08 13:24 
> /oak/data00000a.tar/0001.445ca696-d5d1-4843-a04b-044f84d93663
> -rw-r--r--   1 rekawek supergroup     262144 2017-11-08 13:24 
> /oak/data00000a.tar/0002.91ce6f93-d7ed-4d34-a383-3c3d2eea2acb
> -rw-r--r--   1 rekawek supergroup     262144 2017-11-08 13:24 
> /oak/data00000a.tar/0003.43c09e6f-3d62-4747-ac75-de41b850862a
> (...)
> -rw-r--r--   1 rekawek supergroup     191888 2017-11-08 13:32 
> /oak/data00000a.tar/data00000a.tar.brf
> -rw-r--r--   1 rekawek supergroup     823436 2017-11-08 13:32 
> /oak/data00000a.tar/data00000a.tar.gph
> -rw-r--r--   1 rekawek supergroup      17408 2017-11-08 13:32 
> /oak/data00000a.tar/data00000a.tar.idx
> {noformat}
> For the segment files, each name is prefixed with the index number. This 
> allows to maintain an order, as in the tar archive. This order is normally 
> stored in the index files as well, but if it's missing, the recovery process 
> needs it.
> Each file contains the raw segment data, with no padding/headers. Apart from 
> the segment files, there are 3 special files: binary references (.brf), 
> segment graph (.gph) and segment index (.idx).
> h3. Asynchronous writes
> Normally, all the TarWriter writes are synchronous, appending the segments to 
> the tar file. In case of HDFS each write involves a network latency. That's 
> why the SegmentWriteQueue was introduced. The segments are added to the 
> blocking deque, which is served by a number of the consumer threads, writing 
> the segments to HDFS. There's also a map UUID->Segment, which allows to 
> return the segments in case they are requested by the readSegment() method 
> before they are actually persisted. Segments are removed from the map only 
> after a successful write operation.
> The flush() method blocks accepting the new segments and returns after all 
> waiting segments are written. The close() method waits until the current 
> operations are finished and stops all threads.
> The asynchronous mode can be disabled by setting the number of threads to 0.
> h5. Queue recovery mode
> If the HDFS write() operation fails, the segment will be re-added and the 
> queue is switched to an "recovery mode". In this mode, all the threads are 
> suspended and new segments are not accepted (active waiting). There's a 
> single thread which retries adding the segment with some delay. If the 
> segment is successfully written, the queue will back to the normal operation.
> This way the unavailable HDFS service is not flooded by the requests and 
> we're not accepting the segments when we can't persist them.
> The close() method finishes the recovery mode - in this case, some of the 
> awaiting segments won't be persisted.
> h5. Consistency
> The asynchronous mode isn't as reliable as the standard, synchronous case. 
> Following cases are possible:
> * TarWriter#writeEntry() returns successfully, but the segments are not 
> persisted.
> * TarWriter#writeEntry() accepts a number of segments: S1, S2, S3. The S2 and 
> S3 are persisted, but the S1 is not.
> On the other hand:
> * If the TarWriter#flush() returns successfully, it means that all the 
> accepted segments has been persisted.
> h5. Recovery
> During the segment recovery (eg. if the index file is missing), the HDFS 
> implementation checks if there's no missing segment in the middle. If so, 
> only the consecutive segments are recovered. For instance, if we have S1, S2, 
> S3, S5, S6, S7, then the recovery process will return only the first three.
> h3. TODO
> * move the implementation to its own bundle (requires OSGi support for 
> SegmentArchvieManager).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to