[ 
https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866114#comment-13866114
 ] 

Haohui Mai commented on HDFS-5722:
----------------------------------

Had an offline discussion with @Jing Zhao, and digged into the original jira 
(HDFS-1435) that did compression work.

One concern is that it might increase disk I/O when writing FSImage 
uncompressed into the disk. The following table shows that it does not seems to 
be a problem:

https://issues.apache.org/jira/browse/HDFS-1435?focusedCommentId=12921060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12921060

Based on the data, I think it makes sense to move compression out of the 
FSImage format. The code can compress the data on the fly when transferring it 
through HTTP, or write the FSImage uncompressed onto the disk, and compute the 
digest and compresses the whole file in the background. Both solutions can 
reduce the time that the NN spent safe mode when saving the namespace.


> Implement compression in the HTTP server of SNN / SBN instead of FSImage
> ------------------------------------------------------------------------
>
>                 Key: HDFS-5722
>                 URL: https://issues.apache.org/jira/browse/HDFS-5722
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Haohui Mai
>
> The current FSImage format support compression, there is a field in the 
> header which specifies the compression codec used to compress the data in the 
> image. The main motivation was to reduce the number of bytes to be 
> transferred between SNN / SBN / NN.
> The main disadvantage, however, is that it requires the client to access the 
> FSImage in strictly sequential order. This might not fit well with the new 
> design of FSImage. For example, serializing the data in protobuf allows the 
> client to quickly skip data that it does not understand. The compression 
> built-in the format, however, complicates the calculation of offsets and 
> lengths. Recovering from a corrupted, compressed FSImage is also non-trivial 
> as off-the-shelf tools like bzip2recover is inapplicable.
> This jira proposes to move the compression from the format of the FSImage to 
> the transport layer, namely, the HTTP server of SNN / SBN. This design 
> simplifies the format of FSImage, opens up the opportunity to quickly 
> navigate through the FSImage, and eases the process of recovery. It also 
> retains the benefits of reducing the number of bytes to be transferred across 
> the wire since there are compression on the transport layer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to