[ https://issues.apache.org/jira/browse/KUDU-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adar Dembo updated KUDU-2052: ----------------------------- Code Review: http://gerrit.cloudera.org:8080/7269 Labels: data-scalability (was: ) > Use XFS_IOC_UNRESVSP64 ioctl to punch holes on xfs filesystems > -------------------------------------------------------------- > > Key: KUDU-2052 > URL: https://issues.apache.org/jira/browse/KUDU-2052 > Project: Kudu > Issue Type: Bug > Components: util > Affects Versions: 1.4.0 > Reporter: Adar Dembo > Assignee: Adar Dembo > Priority: Critical > Labels: data-scalability > > One of the changes in Kudu 1.4 is a more comprehensive repair functionality > in log block manager startup. Amongst other things this includes a heuristic > to detect whether an LBM container consumes more disk space than it should, > based on the live blocks in the container. If the heuristic fires, the LBM > reclaims the extra disk space by truncating the end of the container and > repunching out all of the dead blocks in the container. > We brought up Kudu 1.4 on a large production cluster running xfs and observed > pathologically slow startup times. On one node, there was a three hour gap > between the last bit of data directory processing and the end of LBM startup > in general. This time can only be attributed to hole repunching, which is > executed by the same set of thread pools that open the data directories. > Further research revealed that on xfs in el6, a hole punch via fallocate() > _always_ includes an fsync() (in the kernel), even if the underlying data was > already punched out. This isn't the case with ext4, nor does it appear to be > the case with xfs in more modern kernels (though this hasn't been confirmed). > xfs provides the [XFS_IOC_UNRESVSP64 > ioctl|https://linux.die.net/man/3/xfsctl], which can be used to deallocate > space from a file. That sounds an awful lot like hole punching, and some > quick performance tests show that it doesn't incur the cost of an fsync(). We > should switch over to it when punching holes on xfs. Certainly on older (i.e. > el6) kernels, and potentially everywhere for simplicity's sake. -- This message was sent by Atlassian JIRA (v6.4.14#64029)