On 01/29/2016 09:41 PM, Mike Christie wrote:
>> So something must have also changed with the initiator some
>> time after 3.16 that it now triggers the bug...
> 
> The block layer used to cap block/fs IO at I think around 1024 sectors.
> It now lets you go up to the a min of the driver and what the target
> says it can handle.
> 
> Check out
> 
> /sys/block/sdX/queue/max_hw_sectors_kb
> and
> /sys/block/sdX/queue/max_sectors_kb
> 
> in the different kernel versions. In the newer kernel you will see
> max_sectors_kb a lot higher. You can work around the problem by manually
> setting that lower.

Yes. max_hw_sectors_kb is always 32767, but with the older
kernel max_sectors_kb is 512, with the newer one it's 8192.

Setting that down to 4096 doesn't give me any errors anymore,
even with the newer kernel on the initiator side.

(Target is still on the old kernel, as I'd have to reboot
that machine, so that'll have to wait until tomorrow.)

However:

> When you login to the target do a
> 
> sg_inq  -p 0xb0 /dev/sdX
> 
> and check if the optimal/max transfer lengths are 0. For the newer
> versions it should give you a non zero value.

No, even with the old target (still have to try newer one) I
get the following output:

VPD INQUIRY: Block limits page (SBC)
  Maximum compare and write length: 1 blocks
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 16384 blocks
  Optimal transfer length: 16384 blocks
  Maximum prefetch transfer length: 0 blocks

I assume 1 block == 512 bytes, so 16ki blocks would be 8 MiB.
This matches max_sectors_kb on the new kernel.

So it seems that the new kernel takes the maximum transfer
length from LIO and uses it as max_sectors_kb (which seems
to be correct behavior to me), while the old kernel just
assumes 1024 blocks as the limit. I therefore think the new
kernel does behave correctly, because the limit is set
according to what LIO reports - it just appears to be the
case that my target can't actually properly handle what it
reports, so this appears to be a definite bug in at least
older LIO versions.

I'll check with a newer kernel on the target side and go on
from there (if it fixes the issue, I'll do some bisecting,
if it doesn't, I'll report to the LIO people).

Thanks for all your help, very much appreciated.

Regards,
Christian

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to