So I think it's kind of cute that you've implemented these as agnostic wrappers that work with any allocator ... but why?
I would have expected the functionality to just be added directly to the allocator to explicitly request whole aligned pages which IIRC it's already capable of doing but just doesn't have any way to explicitly request. DirectIO doesn't really need a wide variety of allocation sizes or alignments, it's always going to be the physical block size which apparently can be as low as 512 bytes but I'm guessing we're always going to be using 4kB alignment and multiples of 8kB allocations. Wouldn't just having a pool of 8kB pages all aligned on 4kB or 8kB alignment be simpler and more efficient than working around misaligned pointers and having all these branches and arithmetic happening?