On 4/29/26 11:24 AM, Mikulas Patocka wrote:
Hi

I think the best way how to support it would be to modify the VDO target
to use the asynchronous compression API (so that it could use arbitrary
algorithms). Then, the support for IAA could be plugged in easily with
little or no extra code.

I agree that this is the right approach, but supporting arbitrary compression algorithms will require some significant changes on its own. dm-vdo currently has nowhere to store information about what algorithm is used for which blocks, so it would require reworking the metadata. (We would probably store the extra compression information in the compression block header.) This metadata rework in turn will require some effort to make sure we can continue to support existing users who will want to continue to use dm-vdo volumes in the current format.

I have a branch lying around that has done some of this work already. I set it aside because there didn't seem to be much interest in it and it is fairly complex, but it could certainly be brought back if it would be helpful to someone.

It is not good to have branches like "if (iaa_enabled) ... else ...;",
because that would just blow into unmaintainable bunch of code when other
accelerators (maybe s390x?) would be added.

Also, it would be good (to ease reviewing), to split the patch into
several smaller patches.

Mikulas


On Wed, 29 Apr 2026, ... wrote:


Hello,

I am following up here after an earlier reply suggested that Linux dm-vdo 
changes should be discussed on the device-mapper mailing list rather than as a 
GitHub PR.

  Intel IAA, the In-Memory Analytics Accelerator, is a built-in accelerator in 
recent Intel Xeon processors. One of its main uses is offloading compression 
and decompression
work from CPU cores. This is relevant to dm-vdo because VDO already spends CPU 
time in the compressed write and read paths, and the kernel already exposes IAA 
compression
through the crypto API as the deflate-iaa algorithm. The existing IAA crypto 
driver documentation 
(dm-linux/Documentation/driver-api/crypto/iaa/iaa-crypto.rst) also describes
zswap as one consumer of this interface, so my prototype tries the same general 
model for dm-vdo.

Before changing dm-vdo, I compared the current LZ4 path with IAA on a set of 
single-thread compression tests. For compression, IAA hardware was faster than 
LZ4 on most tested
datasets, withthe measured compression time often reduced by about 1.3x to 3.5x 
compared with LZ4. The compression ratio was also generally comparable to or 
better than LZ4,
because the IAA path uses DEFLATE rather than LZ4. Decompression was more 
mixed: IAA hardware was close to LZ4 or faster on some datasets, but slower on 
others.

One reason I set the compression work aside is that it's not clear there's any performance benefits to using a different compression algorithm in dm-vdo (either for device throughput or for data efficiency). But raw comparisons of different algorithms don't necessarily translate into gains for dm-vdo. It's a lot of work to go through if it turns out that there's no benefit that users can see.

You don't provide a lot of details on how you did the benchmarking you show here (in particular, it's not clear how much compression is available in your test data). But results you show don't seem to show much of a difference, which raises the question of why do this at all?

[IMAGE]
Please see figure1 in the attachments.

The prototype is in:

   https://github.com/dm-vdo/dm-linux/pull/96

It contains two commits:

   dm vdo: add minimal IAA compression support

   dm vdo: preserve compressed block format for IAA

The change is intentionally narrow and only touches the dm-vdo compression and

decompression path under `drivers/md/dm-vdo/`.The design is as shown in the 
figure below:

[IMAGE]

Please see figure2 in the attachments.

The current result is:

1. dm-vdo gets an optional iaa_enabled module parameter.

2. On writes, compress_data_vio() first tries deflate-iaa through the async 
compression crypto API. If IAA is disabled, unavailable, or the IAA compression 
attempt fails, it
falls back to the existing LZ4 path.

3. On reads, uncompress_data_vio() can decode data produced by the IAA path. It 
tries IAA-assisted DEFLATE first, then software zlib inflate, and then the 
existing LZ4 path.

4. The prototype preserves the existing compressed block format. I did not add 
a new on-disk compressor field in this version.

5. Data written through the IAA path is not dependent on IAA hardware being 
present later, because it can still be decoded by the software DEFLATE fallback.

In local testing, the prototype was able to write and read IAA-compressed VDO 
data correctly, including the software fallback case when IAA was not used for 
the read. The
performance is as shown in the figure below. As can be seen, the performance 
remains largely consistent with the original after adding IAA.

Write path:

Command: dd if=../mnt_ori/SRR.fastq of=./SSR.fastq bs=1M oflag=direct

IAA: 6430986673 bytes (6.4 GB, 6.0 GiB) copied, 4.98476 s, 1.3
GB/s

LZ4: 6430986673 bytes (6.4 GB, 6.0 GiB) copied, 4.91684 s, 1.3 GB/s

Read path:

[IMAGE]

Please see figure3 in the attachments.

The reason I think this may be worth discussing is that IAA gives dm-vdo a way 
to offload compression work on systems that already have the accelerator, while 
keeping the
existing LZ4 path as the compatibility and fallback path. The goal of this RFC 
is not to propose a final format or policy yet, but to check whether this 
direction is acceptable
before I spend more time preparing a proper patch series.

I would appreciate feedback on whether using the existing kernel IAA crypto API 
from dm-vdo is a reasonable direction, and whether preserving the current 
compressed block
format is acceptable for an initial RFC.

Best regards,





This is a minor point perhaps, but you didn't sign a name to this message. My understanding is that kernel contributions should be attributable to real people using their real names. Please keep that in mind for the future.

Matt Sakai


Reply via email to