Re: BLKZEROOUT not zeroing md dev on VMDK

2016-06-15 Thread Sitsofe Wheeler
On Wed, Jun 15, 2016 at 06:17:37PM +, Arvind Kumar wrote:
> It is possibly some race. We saw a WRITE SAME related issue in past
> for which Petr sent out a patch but looks like the patch didn't make
> it. :(
> 
> https://groups.google.com/forum/#!topic/linux.kernel/1WGDSlyY0y0

Indeed - the investigation you folks did is linked to within the
upstream Bugzilla bug (see
https://bugzilla.kernel.org/show_bug.cgi?id=118581#c2 ). Hopefully this
issue will be resolved but there's still some debate over on
http://thread.gmane.org/gmane.linux.kernel/2236800 . The problem is that
it is causing real problems in stable kernels (data not being correctly
zero'd) today...

-- 
Sitsofe | http://sucs.org/~sits/


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-06-15 Thread Sitsofe Wheeler
On Wed, Jun 15, 2016 at 06:17:37PM +, Arvind Kumar wrote:
> It is possibly some race. We saw a WRITE SAME related issue in past
> for which Petr sent out a patch but looks like the patch didn't make
> it. :(
> 
> https://groups.google.com/forum/#!topic/linux.kernel/1WGDSlyY0y0

Indeed - the investigation you folks did is linked to within the
upstream Bugzilla bug (see
https://bugzilla.kernel.org/show_bug.cgi?id=118581#c2 ). Hopefully this
issue will be resolved but there's still some debate over on
http://thread.gmane.org/gmane.linux.kernel/2236800 . The problem is that
it is causing real problems in stable kernels (data not being correctly
zero'd) today...

-- 
Sitsofe | http://sucs.org/~sits/


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-06-15 Thread Arvind Kumar
It is possibly some race. We saw a WRITE SAME related issue in past for which 
Petr sent out a patch but looks like the patch didn't make it. :(

https://groups.google.com/forum/#!topic/linux.kernel/1WGDSlyY0y0

Thanks!
Arvind

From: Sitsofe Wheeler <sits...@gmail.com>
Sent: Tuesday, May 31, 2016 10:04 PM
To: Tom Yan
Cc: Darrick J. Wong; Shaohua Li; Jens Axboe; Arvind Kumar; VMware PV-Drivers; 
linux-r...@vger.kernel.org; linux-s...@vger.kernel.org; 
linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: BLKZEROOUT not zeroing md dev on VMDK

On 27 May 2016 at 10:30, Tom Yan <tom.t...@gmail.com> wrote:
> There seems to be some sort of race condition between
> blkdev_issue_zeroout() and the scsi disk driver (disabling write same
> after an illegal request). On my UAS drive, sometimes `blkdiscard -z
> /dev/sdX` will return right away, even though if I then check
> `write_same_max_bytes` it has turned 0. Sometimes it will just write
> zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
> I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
> expected).
>
> Not sure if it is directly related to the case here though.

I'm not aware of hitting that particular problem myself directly on
the underlying "SCSI" device but the patch on
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9137311_=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI=W1F45VBu8NDxu2ImcbKM5b3d6UnUCLGgH8xEM9e6JQk=
  should be able to resolve
that issue. Could you test it and follow up on
https://urldefense.proofpoint.com/v2/url?u=http-3A__permalink.gmane.org_gmane.linux.kernel_2229377=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI=9ekqmTk18vzcwcY0SSMF8AZnJ_lWezFIM8tDvQqeDHI=
  ? I'm hoping
more testing reports will lead to the patch being reviewed and
accepted sooner rather than later as it's currently stalled...

--
Sitsofe | 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sucs.org_-7Esits_=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI=arwniVbdl5KJZfyreWLhq-WUlgvKAf_eW1i6D2GbFGQ=


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-06-15 Thread Arvind Kumar
It is possibly some race. We saw a WRITE SAME related issue in past for which 
Petr sent out a patch but looks like the patch didn't make it. :(

https://groups.google.com/forum/#!topic/linux.kernel/1WGDSlyY0y0

Thanks!
Arvind

From: Sitsofe Wheeler 
Sent: Tuesday, May 31, 2016 10:04 PM
To: Tom Yan
Cc: Darrick J. Wong; Shaohua Li; Jens Axboe; Arvind Kumar; VMware PV-Drivers; 
linux-r...@vger.kernel.org; linux-s...@vger.kernel.org; 
linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: BLKZEROOUT not zeroing md dev on VMDK

On 27 May 2016 at 10:30, Tom Yan  wrote:
> There seems to be some sort of race condition between
> blkdev_issue_zeroout() and the scsi disk driver (disabling write same
> after an illegal request). On my UAS drive, sometimes `blkdiscard -z
> /dev/sdX` will return right away, even though if I then check
> `write_same_max_bytes` it has turned 0. Sometimes it will just write
> zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
> I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
> expected).
>
> Not sure if it is directly related to the case here though.

I'm not aware of hitting that particular problem myself directly on
the underlying "SCSI" device but the patch on
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9137311_=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI=W1F45VBu8NDxu2ImcbKM5b3d6UnUCLGgH8xEM9e6JQk=
  should be able to resolve
that issue. Could you test it and follow up on
https://urldefense.proofpoint.com/v2/url?u=http-3A__permalink.gmane.org_gmane.linux.kernel_2229377=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI=9ekqmTk18vzcwcY0SSMF8AZnJ_lWezFIM8tDvQqeDHI=
  ? I'm hoping
more testing reports will lead to the patch being reviewed and
accepted sooner rather than later as it's currently stalled...

--
Sitsofe | 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sucs.org_-7Esits_=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=bUMaNc7nC9xbXtaMJrOvIIPNpPH0chY2kdRsskQn6GY=rx_5ntfhkt2GOpfjpiQjoCb5n4gCY7jKznXO0gKYcVI=arwniVbdl5KJZfyreWLhq-WUlgvKAf_eW1i6D2GbFGQ=


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-31 Thread Sitsofe Wheeler
On 27 May 2016 at 10:30, Tom Yan  wrote:
> There seems to be some sort of race condition between
> blkdev_issue_zeroout() and the scsi disk driver (disabling write same
> after an illegal request). On my UAS drive, sometimes `blkdiscard -z
> /dev/sdX` will return right away, even though if I then check
> `write_same_max_bytes` it has turned 0. Sometimes it will just write
> zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
> I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
> expected).
>
> Not sure if it is directly related to the case here though.

I'm not aware of hitting that particular problem myself directly on
the underlying "SCSI" device but the patch on
https://patchwork.kernel.org/patch/9137311/ should be able to resolve
that issue. Could you test it and follow up on
http://permalink.gmane.org/gmane.linux.kernel/2229377 ? I'm hoping
more testing reports will lead to the patch being reviewed and
accepted sooner rather than later as it's currently stalled...

-- 
Sitsofe | http://sucs.org/~sits/


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-31 Thread Sitsofe Wheeler
On 27 May 2016 at 10:30, Tom Yan  wrote:
> There seems to be some sort of race condition between
> blkdev_issue_zeroout() and the scsi disk driver (disabling write same
> after an illegal request). On my UAS drive, sometimes `blkdiscard -z
> /dev/sdX` will return right away, even though if I then check
> `write_same_max_bytes` it has turned 0. Sometimes it will just write
> zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
> I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
> expected).
>
> Not sure if it is directly related to the case here though.

I'm not aware of hitting that particular problem myself directly on
the underlying "SCSI" device but the patch on
https://patchwork.kernel.org/patch/9137311/ should be able to resolve
that issue. Could you test it and follow up on
http://permalink.gmane.org/gmane.linux.kernel/2229377 ? I'm hoping
more testing reports will lead to the patch being reviewed and
accepted sooner rather than later as it's currently stalled...

-- 
Sitsofe | http://sucs.org/~sits/


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-27 Thread Tom Yan
There seems to be some sort of race condition between
blkdev_issue_zeroout() and the scsi disk driver (disabling write same
after an illegal request). On my UAS drive, sometimes `blkdiscard -z
/dev/sdX` will return right away, even though if I then check
`write_same_max_bytes` it has turned 0. Sometimes it will just write
zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
expected).

Not sure if it is directly related to the case here though.

On 27 May 2016 at 12:45, Sitsofe Wheeler  wrote:
> On 27 May 2016 at 05:18, Darrick J. Wong  wrote:
>>
>> It's possible that the pvscsi device advertised WRITE SAME, but if the device
>> sends back ILLEGAL REQUEST then the SCSI disk driver will set
>> write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue 
>> writes
>> of zeroes to the drive.
>
> Thanks for following up on this but that's not what happens on the md
> device - you can go on to issue as many BLKZEROOUT requests as you
> like but the md disk is never zeroed nor is an error returned.
>
> I filed a bug at https://bugzilla.kernel.org/show_bug.cgi?id=118581
> (see https://bugzilla.kernel.org/show_bug.cgi?id=118581#c6 for
> alternative reproduction steps that use scsi_debug and can be reworked
> to impact device mapper) and Shaohua Li noted that
> blkdev_issue_write_same could return 0 even when the disk didn't
> support write same (see
> https://bugzilla.kernel.org/show_bug.cgi?id=118581#c8 ).
>
> Shaohua went on to create a patch for this ("block: correctly fallback
> for zeroout" - https://patchwork.kernel.org/patch/9137311/ ) which has
> yet to be reviewed.
>
> --
> Sitsofe | http://sucs.org/~sits/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-27 Thread Tom Yan
There seems to be some sort of race condition between
blkdev_issue_zeroout() and the scsi disk driver (disabling write same
after an illegal request). On my UAS drive, sometimes `blkdiscard -z
/dev/sdX` will return right away, even though if I then check
`write_same_max_bytes` it has turned 0. Sometimes it will just write
zero with SCSI WRITE even if `write_same_max_bytes` is 33553920 before
I issue `blkdiscard -z` (`write_same_max_bytes` also turned 0, as
expected).

Not sure if it is directly related to the case here though.

On 27 May 2016 at 12:45, Sitsofe Wheeler  wrote:
> On 27 May 2016 at 05:18, Darrick J. Wong  wrote:
>>
>> It's possible that the pvscsi device advertised WRITE SAME, but if the device
>> sends back ILLEGAL REQUEST then the SCSI disk driver will set
>> write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue 
>> writes
>> of zeroes to the drive.
>
> Thanks for following up on this but that's not what happens on the md
> device - you can go on to issue as many BLKZEROOUT requests as you
> like but the md disk is never zeroed nor is an error returned.
>
> I filed a bug at https://bugzilla.kernel.org/show_bug.cgi?id=118581
> (see https://bugzilla.kernel.org/show_bug.cgi?id=118581#c6 for
> alternative reproduction steps that use scsi_debug and can be reworked
> to impact device mapper) and Shaohua Li noted that
> blkdev_issue_write_same could return 0 even when the disk didn't
> support write same (see
> https://bugzilla.kernel.org/show_bug.cgi?id=118581#c8 ).
>
> Shaohua went on to create a patch for this ("block: correctly fallback
> for zeroout" - https://patchwork.kernel.org/patch/9137311/ ) which has
> yet to be reviewed.
>
> --
> Sitsofe | http://sucs.org/~sits/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-26 Thread Sitsofe Wheeler
On 27 May 2016 at 05:18, Darrick J. Wong  wrote:
>
> It's possible that the pvscsi device advertised WRITE SAME, but if the device
> sends back ILLEGAL REQUEST then the SCSI disk driver will set
> write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue writes
> of zeroes to the drive.

Thanks for following up on this but that's not what happens on the md
device - you can go on to issue as many BLKZEROOUT requests as you
like but the md disk is never zeroed nor is an error returned.

I filed a bug at https://bugzilla.kernel.org/show_bug.cgi?id=118581
(see https://bugzilla.kernel.org/show_bug.cgi?id=118581#c6 for
alternative reproduction steps that use scsi_debug and can be reworked
to impact device mapper) and Shaohua Li noted that
blkdev_issue_write_same could return 0 even when the disk didn't
support write same (see
https://bugzilla.kernel.org/show_bug.cgi?id=118581#c8 ).

Shaohua went on to create a patch for this ("block: correctly fallback
for zeroout" - https://patchwork.kernel.org/patch/9137311/ ) which has
yet to be reviewed.

-- 
Sitsofe | http://sucs.org/~sits/


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-26 Thread Sitsofe Wheeler
On 27 May 2016 at 05:18, Darrick J. Wong  wrote:
>
> It's possible that the pvscsi device advertised WRITE SAME, but if the device
> sends back ILLEGAL REQUEST then the SCSI disk driver will set
> write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue writes
> of zeroes to the drive.

Thanks for following up on this but that's not what happens on the md
device - you can go on to issue as many BLKZEROOUT requests as you
like but the md disk is never zeroed nor is an error returned.

I filed a bug at https://bugzilla.kernel.org/show_bug.cgi?id=118581
(see https://bugzilla.kernel.org/show_bug.cgi?id=118581#c6 for
alternative reproduction steps that use scsi_debug and can be reworked
to impact device mapper) and Shaohua Li noted that
blkdev_issue_write_same could return 0 even when the disk didn't
support write same (see
https://bugzilla.kernel.org/show_bug.cgi?id=118581#c8 ).

Shaohua went on to create a patch for this ("block: correctly fallback
for zeroout" - https://patchwork.kernel.org/patch/9137311/ ) which has
yet to be reviewed.

-- 
Sitsofe | http://sucs.org/~sits/


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-26 Thread Darrick J. Wong
On Wed, May 18, 2016 at 11:39:30PM +0100, Sitsofe Wheeler wrote:
> Hi,
> 
> With Ubuntu's 4.4.0-22-generic kernel and a Fedora 23
> 4.6.0-1.vanilla.knurd.1.fc23.x86_64 kernel I've found that the
> BLKZEROOUT syscall can malfunction and not zero data.
> 
> When BLKZEROOUT is issued to an MD device atop a PVSCSI controller
> supplied VMDK from ESXi 6.0 the call returns immediately and with a zero
> return code. Unfortunately, inspecting the data on the MD device shows
> that it has not been zeroed and is in fact untouched. The easiest way to
> see this behaviour is to boot the VM, create an mdadm device atop
> /dev/sd?, scribble some non-zero value on the disk and then use
> blkdiscard --zeroout /dev/md??? . If you then inspect the MD disk (e.g.
> with hexdump) you will still see the old data and using POSIX_FADV_DONTNEED
> on the MD device doesn't change the outcome.
> 
> The only clue I've seen is that
> /sys/block/sd?/queue/write_same_max_bytes starts out being 33553920 but
> after a WRITE SAME is issued it becomes 0. If the MD device is created
> after write_same_max_bytes has become 0 on the backing disk then
> BLKZEROOUT seems to work correctly.

It's possible that the pvscsi device advertised WRITE SAME, but if the device
sends back ILLEGAL REQUEST then the SCSI disk driver will set
write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue writes
of zeroes to the drive.

--D

> 
> -- 
> Sitsofe | http://sucs.org/~sits/


Re: BLKZEROOUT not zeroing md dev on VMDK

2016-05-26 Thread Darrick J. Wong
On Wed, May 18, 2016 at 11:39:30PM +0100, Sitsofe Wheeler wrote:
> Hi,
> 
> With Ubuntu's 4.4.0-22-generic kernel and a Fedora 23
> 4.6.0-1.vanilla.knurd.1.fc23.x86_64 kernel I've found that the
> BLKZEROOUT syscall can malfunction and not zero data.
> 
> When BLKZEROOUT is issued to an MD device atop a PVSCSI controller
> supplied VMDK from ESXi 6.0 the call returns immediately and with a zero
> return code. Unfortunately, inspecting the data on the MD device shows
> that it has not been zeroed and is in fact untouched. The easiest way to
> see this behaviour is to boot the VM, create an mdadm device atop
> /dev/sd?, scribble some non-zero value on the disk and then use
> blkdiscard --zeroout /dev/md??? . If you then inspect the MD disk (e.g.
> with hexdump) you will still see the old data and using POSIX_FADV_DONTNEED
> on the MD device doesn't change the outcome.
> 
> The only clue I've seen is that
> /sys/block/sd?/queue/write_same_max_bytes starts out being 33553920 but
> after a WRITE SAME is issued it becomes 0. If the MD device is created
> after write_same_max_bytes has become 0 on the backing disk then
> BLKZEROOUT seems to work correctly.

It's possible that the pvscsi device advertised WRITE SAME, but if the device
sends back ILLEGAL REQUEST then the SCSI disk driver will set
write_same_max_bytes=0.  Subsequent BLKZEROOUT attempts will then issue writes
of zeroes to the drive.

--D

> 
> -- 
> Sitsofe | http://sucs.org/~sits/


BLKZEROOUT not zeroing md dev on VMDK

2016-05-18 Thread Sitsofe Wheeler
Hi,

With Ubuntu's 4.4.0-22-generic kernel and a Fedora 23
4.6.0-1.vanilla.knurd.1.fc23.x86_64 kernel I've found that the
BLKZEROOUT syscall can malfunction and not zero data.

When BLKZEROOUT is issued to an MD device atop a PVSCSI controller
supplied VMDK from ESXi 6.0 the call returns immediately and with a zero
return code. Unfortunately, inspecting the data on the MD device shows
that it has not been zeroed and is in fact untouched. The easiest way to
see this behaviour is to boot the VM, create an mdadm device atop
/dev/sd?, scribble some non-zero value on the disk and then use
blkdiscard --zeroout /dev/md??? . If you then inspect the MD disk (e.g.
with hexdump) you will still see the old data and using POSIX_FADV_DONTNEED
on the MD device doesn't change the outcome.

The only clue I've seen is that
/sys/block/sd?/queue/write_same_max_bytes starts out being 33553920 but
after a WRITE SAME is issued it becomes 0. If the MD device is created
after write_same_max_bytes has become 0 on the backing disk then
BLKZEROOUT seems to work correctly.

-- 
Sitsofe | http://sucs.org/~sits/


BLKZEROOUT not zeroing md dev on VMDK

2016-05-18 Thread Sitsofe Wheeler
Hi,

With Ubuntu's 4.4.0-22-generic kernel and a Fedora 23
4.6.0-1.vanilla.knurd.1.fc23.x86_64 kernel I've found that the
BLKZEROOUT syscall can malfunction and not zero data.

When BLKZEROOUT is issued to an MD device atop a PVSCSI controller
supplied VMDK from ESXi 6.0 the call returns immediately and with a zero
return code. Unfortunately, inspecting the data on the MD device shows
that it has not been zeroed and is in fact untouched. The easiest way to
see this behaviour is to boot the VM, create an mdadm device atop
/dev/sd?, scribble some non-zero value on the disk and then use
blkdiscard --zeroout /dev/md??? . If you then inspect the MD disk (e.g.
with hexdump) you will still see the old data and using POSIX_FADV_DONTNEED
on the MD device doesn't change the outcome.

The only clue I've seen is that
/sys/block/sd?/queue/write_same_max_bytes starts out being 33553920 but
after a WRITE SAME is issued it becomes 0. If the MD device is created
after write_same_max_bytes has become 0 on the backing disk then
BLKZEROOUT seems to work correctly.

-- 
Sitsofe | http://sucs.org/~sits/