Adar Dembo created KUDU-2052:
--------------------------------

             Summary: Use XFS_IOC_UNRESVSP64 ioctl to punch holes on xfs 
filesystems
                 Key: KUDU-2052
                 URL: https://issues.apache.org/jira/browse/KUDU-2052
             Project: Kudu
          Issue Type: Bug
          Components: util
    Affects Versions: 1.4.0
            Reporter: Adar Dembo
            Assignee: Adar Dembo
            Priority: Critical


One of the changes in Kudu 1.4 is a more comprehensive repair functionality in 
log block manager startup. Amongst other things this includes a heuristic to 
detect whether an LBM container consumes more disk space than it should, based 
on the live blocks in the container. If the heuristic fires, the LBM reclaims 
the extra disk space by truncating the end of the container and repunching out 
all of the dead blocks in the container.

We brought up Kudu 1.4 on a large production cluster running xfs and observed 
pathologically slow startup times. On one node, there was a three hour gap 
between the last bit of data directory processing and the end of LBM startup in 
general. This time can only be attributed to hole repunching, which is executed 
by the same set of thread pools that open the data directories.

Further research revealed that on xfs in el6, a hole punch via fallocate() 
_always_ includes an fsync() (in the kernel), even if the underlying data was 
already punched out. This isn't the case with ext4, nor does it appear to be 
the case with xfs in more modern kernels (though this hasn't been confirmed).

xfs provides the [XFS_IOC_UNRESVSP64 ioctl|https://linux.die.net/man/3/xfsctl], 
which can be used to deallocate space from a file. That sounds an awful lot 
like hole punching, and some quick performance tests show that it doesn't incur 
the cost of an fsync(). We should switch over to it when punching holes on xfs. 
Certainly on older (i.e. el6) kernels, and potentially everywhere for 
simplicity's sake.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to