Re: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching

Leonid Ravich Mon, 22 Jun 2026 00:11:12 -0700

On Mon, Jun 15, 2026 at 03:53:17PM -0700, Eric Biggers wrote:
> So in other words, this series slows down dm-crypt and crypto_skcipher
> for everyone to optimize for an out-of-tree driver.  And there's also no
> benchmark showing that your driver is even worth it over just using the
> CPU.


I measured on arm64 (Graviton3, dm-crypt + xts-aes-ce, RAM-backed,
fixed CPU freq):

  - 4 KiB random write, 512-byte sectors: v4 as posted regressed ~5%.
    Root cause (ftrace): a per-bio kmalloc_array() for the scatterlists,
    where the per-sector path uses dm-crypt's inline sg_in[]/sg_out[].

  - Reusing the inline arrays when the segment count fits (heap only for
    larger bios) removes the regression, back to parity. This will be in
    the dm-crypt patch for v5.

So the software path is neutral after the fix, not slower. No software 
throughput win
either: the auto-splitter still calls alg->encrypt per data unit. The win
is for a consumer that takes the whole request in one pass, a HW engine,
or any async offload engine that pays a fixed per-request cost,
it currently pays once per sector instead of once per bio.

I'd rather not over-complicate the patches until there's a general
ack on the direction: per-request data_unit_size + auto-split,
enabling one-pass consumers, neutral for everyone else. Is that direction
acceptable? If so I'll respin v5.

Thanks,
Leonid

Re: [PATCH v4 0/3] crypto: skcipher - per-request multi-data-unit batching

Reply via email to