On Tue, 2019-03-05 at 16:27 -0500, Phillip Susi wrote: > On 3/5/2019 10:58 AM, Kevin Locke wrote: >> Sounds great. How do you propose that the kernel determine the >> optimal alignment? > > md does it using the stripe size. Not sure if anything other the md or > dm would make sense to populate the value. Well, I guess hardware raid > drivers.
Sounds reasonable to me. Feel free to propose it to the kernel maintainers. >> the disk on which I am running parted is not a RAID array, I don't >> think the documentation above says that it is anything more than >> "preferred unit for sustained I/O". > > Yes, the first part says that, but then it goes on to say that normal > disks generally leave it zero, and raid disks set it to the stripe width. Documentation/ABI/testing/sysfs-block does not say "normal disks generally leave it 0", it says "If no optimal I/O size is reported this file contains 0." SCSI disks report an optimal I/O size via VPD. I still think the documentation here is correct. If you disagree, feel free to report it to the kernel maintainers. >>> Wait, how can optimal_io_size NOT be a multiple of the block size? >> >> For my disk, the logical block size is 512 bytes, the physical block >> size is 4,096, opt_xfer_blocks is 65,535, so optimal_io_size is >> 65,535*512=33,553,920 which is not a multiple of 4,096. I considered >> advocating that the kernel check this, but decided against it. > > Oh, that is weird. I guess such a sanity check would fix the issue for > your USB stick, but what about others? Are there cases where the optimal partition alignment is not a multiple of the physical sector size? If so, lets consider whether they can be worked into the sanity checking logic. If not, are there other risks that you foresee which are not shared by util-linux and cryptsetup, which have been using such a sanity check for years? Also, if "your USB stick" was intended to suggest that this is not a common problem, I would disagree. I suspect it occurs on most/all Seagate UAS drives (which share some other known problems[1]). >> SCSI devices can report any value (measured in logical blocks) for VPD >> Optimal Transfer Length. It is not restricted to multiples of the >> physical block size. For my disk, it is not, which is the cause of >> the current issue. > > So for 512e disks basically, the optimal transfer length can be not a > multiple of physical block size and foolish drives try to specify the > maximum possible value in logical 512 byte sectors, and that ends up > being 1 logical sector too small to align to 4k. For 512n 4kn disks, > the optimal size can never not be a multiple of the sector size, so the > sanity check would pass and still give you a massive alignment you don't > want. I agree that is a potential issue. Sanity checking the size of optimal_io_size would make sense. At the moment, I'm less concerned about wasted space than misalignment, so I don't have an opinion on how to handle that case. Kevin [1]: https://github.com/torvalds/linux/commit/7fee72d5e8f1