On Tue, 2019-03-05 at 08:49 -0500, Phillip Susi wrote:
> On 3/4/2019 5:29 PM, Kevin Locke wrote:
> > On 17 June 2015 at 01:08, Martin K. Petersen <martin.peter...@oracle.com> 
> > wrote:
>>> There's only so much we can do about devices that report garbage.
>>>
>>> Also, the kernel only reports things. It is up to Karel to decide
>>> whether to sanity check the values before he uses them.
>> 
>> If we are going to bring it up again, I'd like to have a specific
>> recommendation or request.  More discussion below:
> 
> It sounds like it wasn't intended to have anything to do with alignment,
> but since MD used it that way, parted interpreted it that way.  If
> optimal_io_size is just that, then maybe a new variable needs to be
> created to expose optimal alignment?

Sounds great.  How do you propose that the kernel determine the
optimal alignment?

>> Could you point me to the kernel documentation that mentions how the
>> md driver uses optimal_io_size?
> 
> Documentation/ABI/testing/sysfs-block:
> 
> Storage devices may report an optimal I/O size, which is
> the device's preferred unit for sustained I/O.  This is
> rarely reported for disk drives.  For RAID arrays it is
> usually the stripe width or the internal track size.

I disagree that what you quoted says that that the md driver uses
optimal_io_size for anything, much less unconditionally.  Also, since
the disk on which I am running parted is not a RAID array, I don't
think the documentation above says that it is anything more than
"preferred unit for sustained I/O".

>> My current reading is that optimal_io_size has the same definition as
>> SCSI VPD Optimal Transfer Length.  It has a loosely defined meaning,
>> but its value for any particular use is contingent on sanity checking.
> 
> I'm not sure how you can sanity check it.  Either it has meaning
> relevant to alignment or it doesn't.  It sounds like it wasn't supposed
> to even though md used it that way.

I suggest sanity checking it in the same way that cryptsetup and
util-linux now do, by checking that it is a multiple of the physical
sector size or minimum_io_size.

>> Do you have a particular proposal for how to improve the kernel
>> documentation or how optimal_io_size behaves?  I worked up a patch
>> which only uses logical_to_bytes(sdp, sdkp->opt_xfer_blocks) for
>> io_opt if it is a multiple of sdkp->physical_block_size, but I am not
>> convinced enough that it is universally applicable to advocate for it.
>> Any alternative suggestions?
> 
> Wait, how can optimal_io_size NOT be a multiple of the block size?

For my disk, the logical block size is 512 bytes, the physical block
size is 4,096, opt_xfer_blocks is 65,535, so optimal_io_size is
65,535*512=33,553,920 which is not a multiple of 4,096.  I considered
advocating that the kernel check this, but decided against it.

SCSI devices can report any value (measured in logical blocks) for VPD
Optimal Transfer Length.  It is not restricted to multiples of the
physical block size.  For my disk, it is not, which is the cause of
the current issue.

Kevin

Reply via email to