On 4/29/26 11:24 AM, Mikulas Patocka wrote:
Hi
I think the best way how to support it would be to modify the VDO target
to use the asynchronous compression API (so that it could use arbitrary
algorithms). Then, the support for IAA could be plugged in easily with
little or no extra code.
I agree that this is the right approach, but supporting arbitrary
compression algorithms will require some significant changes on its own.
dm-vdo currently has nowhere to store information about what algorithm
is used for which blocks, so it would require reworking the metadata.
(We would probably store the extra compression information in the
compression block header.) This metadata rework in turn will require
some effort to make sure we can continue to support existing users who
will want to continue to use dm-vdo volumes in the current format.
I have a branch lying around that has done some of this work already. I
set it aside because there didn't seem to be much interest in it and it
is fairly complex, but it could certainly be brought back if it would be
helpful to someone.
It is not good to have branches like "if (iaa_enabled) ... else ...;",
because that would just blow into unmaintainable bunch of code when other
accelerators (maybe s390x?) would be added.
Also, it would be good (to ease reviewing), to split the patch into
several smaller patches.
Mikulas
On Wed, 29 Apr 2026, ... wrote:
Hello,
I am following up here after an earlier reply suggested that Linux dm-vdo
changes should be discussed on the device-mapper mailing list rather than as a
GitHub PR.
Intel IAA, the In-Memory Analytics Accelerator, is a built-in accelerator in
recent Intel Xeon processors. One of its main uses is offloading compression
and decompression
work from CPU cores. This is relevant to dm-vdo because VDO already spends CPU
time in the compressed write and read paths, and the kernel already exposes IAA
compression
through the crypto API as the deflate-iaa algorithm. The existing IAA crypto
driver documentation
(dm-linux/Documentation/driver-api/crypto/iaa/iaa-crypto.rst) also describes
zswap as one consumer of this interface, so my prototype tries the same general
model for dm-vdo.
Before changing dm-vdo, I compared the current LZ4 path with IAA on a set of
single-thread compression tests. For compression, IAA hardware was faster than
LZ4 on most tested
datasets, withthe measured compression time often reduced by about 1.3x to 3.5x
compared with LZ4. The compression ratio was also generally comparable to or
better than LZ4,
because the IAA path uses DEFLATE rather than LZ4. Decompression was more
mixed: IAA hardware was close to LZ4 or faster on some datasets, but slower on
others.
One reason I set the compression work aside is that it's not clear
there's any performance benefits to using a different compression
algorithm in dm-vdo (either for device throughput or for data
efficiency). But raw comparisons of different algorithms don't
necessarily translate into gains for dm-vdo. It's a lot of work to go
through if it turns out that there's no benefit that users can see.
You don't provide a lot of details on how you did the benchmarking you
show here (in particular, it's not clear how much compression is
available in your test data). But results you show don't seem to show
much of a difference, which raises the question of why do this at all?
[IMAGE]
Please see figure1 in the attachments.
The prototype is in:
https://github.com/dm-vdo/dm-linux/pull/96
It contains two commits:
dm vdo: add minimal IAA compression support
dm vdo: preserve compressed block format for IAA
The change is intentionally narrow and only touches the dm-vdo compression and
decompression path under `drivers/md/dm-vdo/`.The design is as shown in the
figure below:
[IMAGE]
Please see figure2 in the attachments.
The current result is:
1. dm-vdo gets an optional iaa_enabled module parameter.
2. On writes, compress_data_vio() first tries deflate-iaa through the async
compression crypto API. If IAA is disabled, unavailable, or the IAA compression
attempt fails, it
falls back to the existing LZ4 path.
3. On reads, uncompress_data_vio() can decode data produced by the IAA path. It
tries IAA-assisted DEFLATE first, then software zlib inflate, and then the
existing LZ4 path.
4. The prototype preserves the existing compressed block format. I did not add
a new on-disk compressor field in this version.
5. Data written through the IAA path is not dependent on IAA hardware being
present later, because it can still be decoded by the software DEFLATE fallback.
In local testing, the prototype was able to write and read IAA-compressed VDO
data correctly, including the software fallback case when IAA was not used for
the read. The
performance is as shown in the figure below. As can be seen, the performance
remains largely consistent with the original after adding IAA.
Write path:
Command: dd if=../mnt_ori/SRR.fastq of=./SSR.fastq bs=1M oflag=direct
IAA: 6430986673 bytes (6.4 GB, 6.0 GiB) copied, 4.98476 s, 1.3
GB/s
LZ4: 6430986673 bytes (6.4 GB, 6.0 GiB) copied, 4.91684 s, 1.3 GB/s
Read path:
[IMAGE]
Please see figure3 in the attachments.
The reason I think this may be worth discussing is that IAA gives dm-vdo a way
to offload compression work on systems that already have the accelerator, while
keeping the
existing LZ4 path as the compatibility and fallback path. The goal of this RFC
is not to propose a final format or policy yet, but to check whether this
direction is acceptable
before I spend more time preparing a proper patch series.
I would appreciate feedback on whether using the existing kernel IAA crypto API
from dm-vdo is a reasonable direction, and whether preserving the current
compressed block
format is acceptable for an initial RFC.
Best regards,
This is a minor point perhaps, but you didn't sign a name to this
message. My understanding is that kernel contributions should be
attributable to real people using their real names. Please keep that in
mind for the future.
Matt Sakai