On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote: >> 1) reduce the size of every kernel with block layer support, and >> even more for every kernel with scsi support > > By proposing the removal of bidi support from the block layer, it isn't > just the SCSI subsystem that will be impacted. Those NVMe documents > that you referred me to earlier in the year, in the command tables > in 1.3c and earlier you have noticed the 2 bit direction field and > what 11b means? Even if there aren't any bidi NVMe commands *** yet, > the fact that NVMe's 64 byte command format has provision for 4 > (not 2) independent data transfers (data + meta, for each direction). > Surely NVMe will sooner or later take advantage of those ... a > command like READ GATHERED comes to mind.
NVMe on the other hand does have support for separate read and write buffers as in the current SCSI bidi support, as it encodes the data transfers in that SQE. So IFF NVMe does bidi commands it would have to use a single buffer for data in/out, which can be easily done in the block layer without the current bidi support that chains two struct request instances for data in and data out. >> 2) reduce the size of the critical struct request structure by >> 128 bits, thus reducing the memory used by every blk-mq driver >> significantly, never mind the cache effects > > Hmm, one pointer (that is null in the non-bidi case) should be enough, > that's 64 or 32 bits. Due to the way we use request chaining we need two fields at the moment. ->special and ->next_rq. If we'd refactor the whole thing for the basically non-existent user we could indeed probably get it down to a single pointer. > While on the subject of bidi, the order of transfers: is the data-out > (to the target) always before the data-in or is it the target device > that decides (depending on the semantics of the command) who is first? The way I read SAM data needs to be transferred to the device for processing first, then the processing occurs and then it is transferred out, so the order seems fixed. > > Doug Gilbert > > *** there could already be vendor specific bidi NVMe commands out > there (ditto for SCSI) For NVMe they'd need to transfer data in and out in the same buffer to sort work, and even then only if we don't happen to be bounce buffering using swiotlb, or using a network transport. Similarly for SCSI only iSCSI at the moment supports bidi CDBs, so we could have applications using vendor specific bidi commands on iSCSI, which is exactly what we're trying to find out, but it is a bit of a very niche use case.