This patch series introduces performance improvements for lzo. The improvements fall into two categories: general Arm-specific optimisations (e.g., more efficient memory access); and the introduction of a special caseĀ for handling runs of zeros (which is a common case for zram) using run-length encoding.
The introduction of RLE modifies the bitstream such that it can't be decoded by old versions of lzo (the new lzo-rle can correctly decode old bitstreams). To avoid possible issues where data is persisted on disk (e.g., squashfs), theĀ final patch in this series separates lzo-rle into a separate algorithm alongside lzo, so that the new lzo-rle is (by default) only used for zram and must be explicitly selected for other use-cases. This final patch could be omitted if the consensus is that we'd rather avoid proliferation of lzo variants. Overall, performance is improved by around 1.1 - 4.8x (data-dependent: data with many zero runs shows higher improvement). Under real-world testing with zram, time spent in (de)compression during swapping is reduced by around 27%. The graph below shows the weighted round-trip throughput of lzo, lz4 and lzo-rle, for randomly generated 4k chunks of data with varying levels of entropy. (To calculate weighted round-trip throughput, compression performance is emphasised to reflect the fact that zram does around 2.25x more compression than decompression. (Results and overall trends are fairly similar for unweighted). https://drive.google.com/file/d/18GU4pgRVCLNN7wXxynz-8R2ygrY2IdyE/view Contributors: Dave Rodgman <dave.rodg...@arm.com> Matt Sealey <matt.sea...@arm.com>