On Mon, Jun 15, 2026 at 03:53:17PM -0700, Eric Biggers wrote:
> So in other words, this series slows down dm-crypt and crypto_skcipher
> for everyone to optimize for an out-of-tree driver. And there's also no
> benchmark showing that your driver is even worth it over just using the
> CPU.
I measured on arm64 (Graviton3, dm-crypt + xts-aes-ce, RAM-backed,
fixed CPU freq):
- 4 KiB random write, 512-byte sectors: v4 as posted regressed ~5%.
Root cause (ftrace): a per-bio kmalloc_array() for the scatterlists,
where the per-sector path uses dm-crypt's inline sg_in[]/sg_out[].
- Reusing the inline arrays when the segment count fits (heap only for
larger bios) removes the regression, back to parity. This will be in
the dm-crypt patch for v5.
So the software path is neutral after the fix, not slower. No software
throughput win
either: the auto-splitter still calls alg->encrypt per data unit. The win
is for a consumer that takes the whole request in one pass, a HW engine,
or any async offload engine that pays a fixed per-request cost,
it currently pays once per sector instead of once per bio.
I'd rather not over-complicate the patches until there's a general
ack on the direction: per-request data_unit_size + auto-split,
enabling one-pass consumers, neutral for everyone else. Is that direction
acceptable? If so I'll respin v5.
Thanks,
Leonid