On Wed, Jul 01, 2026 at 12:19:19AM -0700, Eric Biggers wrote:
> No, this didn't address my feedback.  It moved things around but still
> adds additional overhead for everyone to support an out-of-tree driver,
> which also hasn't been shown to be any better than just using the CPU.

Eric, thanks for the fast reply.

Overhead: for a non-user the only cost is the data_unit_size field plus
one zeroing store in set_tfm()/ON_STACK; the en/decrypt paths are
untouched.  A dun() user pays one indirect dispatch into the template per
request plus a scatterwalk step and IV copy per unit -- the same per-DU
bookkeeping the consumer already open-codes today.

On the driver: I agree pushing code optimized for an out-of-tree driver
is wrong, but I don't think that's the case here -- this helps any async
crypto engine, and there are in-tree async xts(aes) ones dm-crypt is
eligible to use today: HiSilicon SEC2, TI DTHEv2, Atmel (I don't have any
to test on).  To bound the win, I used cryptd as a pure async carrier and
moved the per-DU split inside it, then ran dm-crypt + fio: batching cut
CPU ~30% on 128k I/O (large batch) and had zero impact on 4k -- so the
saving is dispatch, not crypto.  A real engine that submits a whole
multi-DU request in one descriptor avoids that per-DU dispatch entirely,
so it saves at least that.

So the question for me is what the bar is: does landing the API and dun()
template now (with the in-tree consolidation it already buys dm-crypt and
blk-crypto-fallback), with a throughput demonstration deferred to a real
async provider, work for you ?

Thanks,
Leonid

Reply via email to