virtio-blk's dataplane BH completion batching mechanism is not enabled by default and the performance results are mixed. If you develop a different mechanism from scratch I think there's a good chance it would work better :).
This looks like a queuing theory problem to me. It should be possible to model IOPS as a function of some parameters and then hopefully find simple rules to optimize IOPS by adjusting some of the parameters at runtime. I haven't looked into this much myself though, so I don't have any concrete suggestion. The basic idea is that as long as events occur at a minimum rate then they can be batched to maximize throughput without sacrificing too much latency. If the rate drops then the device cannot hold back events. Another place to look for inspiration is network cards. In Linux it's common to use the NAPI framework to disable further receive interrupts and then poll until the receive queue becomes empty. Transmit completions can also be mitigated, but I'm not sure what the most common approach is there. Stefan