[PATCH] block: don't check request size in blk_cloned_rq_check_limits()

2016-05-30 Thread Hannes Reinecke
When checking a cloned request there is no need to check
the overall request size; this won't have changed even
when resubmitting to another queue.
Without this patch ppc64le on ibmvfc fails to boot.

Signed-off-by: Hannes Reinecke 
---
 block/blk-core.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2475b1c7..e108bf0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2160,11 +2160,6 @@ EXPORT_SYMBOL(submit_bio);
 static int blk_cloned_rq_check_limits(struct request_queue *q,
  struct request *rq)
 {
-   if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
-   printk(KERN_ERR "%s: over max size limit.\n", __func__);
-   return -EIO;
-   }
-
/*
 * queue's settings related to segment counting like q->bounce_pfn
 * may differ from that of other stacking queues.
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: fix race between simultaneous decrements of ->host_failed

2016-05-30 Thread Wei Fang
Hi James, Christoph,

On 2016/5/29 23:41, James Bottomley wrote:
> On Sat, 2016-05-28 at 23:54 -0700, Christoph Hellwig wrote:
>> On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
>>> async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some 
>>> case, would be performed simultaneously in 
>>> sas_ata_strategy_handler(). In this case, ->host_failed may be 
>>> decreased simultaneously in scsi_eh_finish_cmd() on different CPUs,
>>> and become abnormal.
>>>
>>> It will lead to permanently inequal between ->host_failed and
>>>  ->host_busy. Then SCSI error handler thread won't become running,
>>> SCSI errors after that won't be handled forever.
>>>
>>> Use atomic type for ->host_failed to fix this race.
>>
>> Looks fine,
> 
> Actually, it doesn't look fine at all.  The same mechanism that's
> supposed to protect the host_failed decrement is also supposed to
> protect the list_move_tail().  If there's a problem with the former
> then we're also in danger of corrupting the list.

Scmd is moved to local eh_done_q list here, and I checked that the
list won't be touched concurrently.

> Can we go back to the theory of what the problem is, since it's not
> spelled out very clearly in the change log.  Our usual reason for not
> requiring locking in eh routines is that the eh is single threaded on
> the eh thread per host, so any host manipulations can't have
> concurrency problems.  In this case, the sas_ata routines are trying to
> be clever and use asynchronous workqueues for the port error handler
> and you theorise that these can execute concurrently on two CPUs, thus
> causing the problem?

Yes, it's the case. The works of the port error handler are added to
system_unbound_wq, and will be performed concurrently on different CPUs.
We have already met that problem on our machine.

Thanks,
Wei

> James
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: fix race between simultaneous decrements of ->host_failed

2016-05-30 Thread Wei Fang
Hi, Christoph,

On 2016/5/29 14:54, Christoph Hellwig wrote:
> On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
>> async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some case,
>> would be performed simultaneously in sas_ata_strategy_handler(). In this
>> case, ->host_failed may be decreased simultaneously in
>> scsi_eh_finish_cmd() on different CPUs, and become abnormal.
>>
>> It will lead to permanently inequal between ->host_failed and
>>  ->host_busy. Then SCSI error handler thread won't become running,
>> SCSI errors after that won't be handled forever.
>>
>> Use atomic type for ->host_failed to fix this race.
> 
> Looks fine,
> 
> Reviewed-by: Christoph Hellwig 
> 
> But please also update Documentation/scsi/scsi_eh.txt for this
> change.

Thanks for reviewing the patch.
I looked around the file, and didn't find the part should be updated.
Would you point me out?

Thanks,
Wei

> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bnx2fc: replace printk() with BNX2FC_IO_DBG()

2016-05-30 Thread Maurizio Lombardi
The "fcp_rsp_code = %d" message isn't an error, it's meant to
be informative only.
This patch prevents a flood of such messages in some situations.

Tested-by: Laurence Oberman 
Signed-off-by: Maurizio Lombardi 
---
 drivers/scsi/bnx2fc/bnx2fc_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_io.c b/drivers/scsi/bnx2fc/bnx2fc_io.c
index 026f394..8f24d60 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_io.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_io.c
@@ -1758,7 +1758,7 @@ static void bnx2fc_parse_fcp_rsp(struct bnx2fc_cmd 
*io_req,
if ((fcp_rsp_len == 4) || (fcp_rsp_len == 8)) {
/* Only for task management function */
io_req->fcp_rsp_code = rq_data[3];
-   printk(KERN_ERR PFX "fcp_rsp_code = %d\n",
+   BNX2FC_IO_DBG(io_req, "fcp_rsp_code = %d\n",
io_req->fcp_rsp_code);
}
 
-- 
Maurizio Lombardi

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bnx2fc: replace printk() with BNX2FC_IO_DBG()

2016-05-30 Thread Johannes Thumshirn
On Mon, May 30, 2016 at 10:41:01AM +0200, Maurizio Lombardi wrote:
> The "fcp_rsp_code = %d" message isn't an error, it's meant to
> be informative only.
> This patch prevents a flood of such messages in some situations.
> 
> Tested-by: Laurence Oberman 
> Signed-off-by: Maurizio Lombardi 

Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexpected sync delays in dpkg for small pre-allocated files on ext4

2016-05-30 Thread Gernot Hillier
Hi!

On 25.05.2016 01:13, Theodore Ts'o wrote:
> On Tue, May 24, 2016 at 07:07:41PM +0200, Gernot Hillier wrote:
>> We experience strange delays with kernel 4.1.18 during dpkg
>> package installation on an ext4 filesystem after switching from
>> Ubuntu 14.04 to 16.04. We can reproduce the issue with kernel 4.6.
>> Installation of the same package takes 2s with ext3 and 31s with
>> ext4 on the same partition.
>>
>> Hardware is an Intel-based server with Supermicro X8DTH board and
>> Seagate ST973451SS disks connected to an LSI SAS2008 controller (PCI
>> 0x1000:0x0072, mpt2sas driver).
[...]
>> To me, the problem looks comparable to
>> https://bugzilla.kernel.org/show_bug.cgi?id=56821 (even if we don't see
>> a full hang and there's no RAID involved for us), so a closer look on
>> the SCSI layer or driver might be the next step?
> 
> What I would suggest is to create a small test case which compares the
> time it takes to allocate 1 megabyte of memory, zero it, and then
> write one megabytes of zeros using the write(2) system call.  Then try
> writing one megabytes of zero using the BLKZEROOUT ioctl.

Ok, this is my test code:

const int SIZE = 1*1024*1024;
char* buffer = malloc(SIZE);
uint64_t range[2] = { 0, SIZE };
int fd = open("/dev/sdb2", O_WRONLY);

bzero(buffer, SIZE);
write(fd, buffer, SIZE);
sync_file_range(fd, 0, 0, 2);

ioctl (fd, BLKZEROOUT, range);

close(fd);
free(buffer);

# strace -tt ./test-tytso
[...]
15:46:27.481636 open("/dev/sdb2", O_WRONLY) = 3
15:46:27.482004 write(3, "\0\0\0\0\0\0"..., 1048576) = 1048576
15:46:27.482438 sync_file_range(3, 0, 0, SYNC_FILE_RANGE_WRITE) = 0
15:46:27.482698 ioctl(3, BLKZEROOUT, [0, 10]) = 0
15:46:27.546971 close(3)= 0

So the write() and sync_file_range() in the first case takes ~400 us
each while BLKZEROOUT takes... 60 ms. Wow.

> I'm pretty sure you'll see same issue with BLKZEROOUT ioctl, but at
> this point, we'll be able to send bug reports to the SCSI and block
> layer developers with something that makes this very clear that it has
> nothing to do with ext4.

Ok, I included linux-scsi and mpt2sas maintainers to the thread. Can you
please help to narrow this down further?

> This way we can also do some differential diagnosis; given that I'm
> not seeing this complaint from most people, I suspect it will be a
> matter of adding some specific devices to a blacklist (so that even
> though the SCSI device claims to support WRITE SAME, we need to
> disable it because it has a really lousy implementation of the SCSI
> WRITE SAME command).

Even a BLKZEROOUT for 512 bytes takes nearly 5 ms on this machine. This
really qualifies as "lousy implementation".

Any idea how I could find out whether disks or controller are to blame
here? Getting physical access to the machine and replacing disks might
take us some days, so any other suggestion is greatly appreciated!

-- 
With kind regards,

Gernot Hillier, Siemens AG
Corporate Technology, Corporate Competence Center Embedded Linux

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mpt3sas: memory allocation for firmware upgrade DMA memory question

2016-05-30 Thread Johannes Thumshirn
On Thu, May 26, 2016 at 03:46:58PM +0530, Chaitra Basappa wrote:
> Johannes,
[...]
> 
> If we use GFP_KERNEL flag then it may be possible that ioctl thread may
> hang/wait for long,  if it doesn't get required memory from the system.
> 
> We may need to test below patch thoroughly , as I don’t see allocation of
> several non-contiguous chunks of memory in below patch...,

The question here is if it would be possible to replace the kmalloc()
call with a vmalloc() (which is non-contiguous) and then pass the allocated
memory to a sg list?

Thanks,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Your Response Is Highly Needed!

2016-05-30 Thread descalante
I have a secured business suggestion for you reply me on my email: 
saeedbi...@qq.com
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: fix race between simultaneous decrements of ->host_failed

2016-05-30 Thread James Bottomley
On Mon, 2016-05-30 at 15:27 +0800, Wei Fang wrote:
> Hi James, Christoph,
> 
> On 2016/5/29 23:41, James Bottomley wrote:
> > On Sat, 2016-05-28 at 23:54 -0700, Christoph Hellwig wrote:
> > > On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
> > > > async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in 
> > > > some case, would be performed simultaneously in
> > > > sas_ata_strategy_handler(). In this case, ->host_failed may be 
> > > > decreased simultaneously in scsi_eh_finish_cmd() on different 
> > > > CPUs, and become abnormal.
> > > > 
> > > > It will lead to permanently inequal between ->host_failed and
> > > >  ->host_busy. Then SCSI error handler thread won't become 
> > > > running, SCSI errors after that won't be handled forever.
> > > > 
> > > > Use atomic type for ->host_failed to fix this race.
> > > 
> > > Looks fine,
> > 
> > Actually, it doesn't look fine at all.  The same mechanism that's
> > supposed to protect the host_failed decrement is also supposed to
> > protect the list_move_tail().  If there's a problem with the former
> > then we're also in danger of corrupting the list.
> 
> Scmd is moved to local eh_done_q list here, and I checked that the
> list won't be touched concurrently.
>
> > Can we go back to the theory of what the problem is, since it's not
> > spelled out very clearly in the change log.  Our usual reason for 
> > not requiring locking in eh routines is that the eh is single 
> > threaded on the eh thread per host, so any host manipulations can't 
> > have concurrency problems.  In this case, the sas_ata routines are
> > trying to be clever and use asynchronous workqueues for the port 
> > error handler and you theorise that these can execute concurrently 
> > on two CPUs, thus causing the problem?
> 
> Yes, it's the case. The works of the port error handler are added to
> system_unbound_wq, and will be performed concurrently on different 
> CPUs. We have already met that problem on our machine.

OK, add that to the changelog and also that this fixes

commit 50824d6c5657ce340e3911171865a8d99fdd8eba
Author: Dan Williams 
Date:   Sun Dec 4 01:06:24 2011 -0800

[SCSI] libsas: async ata-eh

Because that's where the concurrency rules weren't verified when this
async threading was added.

One final thing is that we don't need this replaced by atomics.  The
only atomic check we need is the up count, which is already serialised
by the host lock.  Nothing actually ever bothers with the down count,
so it can just be eliminated and host_failed set to zero after the
strategy handle is complete (but before scsi_restart_operations) in the
eh_thread.

Once this change is made, scsi_eh_finish_cmd() and
scsi_eh_flush_done_q() are safe provided the done_q list is not
modifiable by any other thread.

As Christoph said, the documentation needs updating to reflect these
new concurrency rules.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi_debug: fix sleep in invalid context

2016-05-30 Thread Douglas Gilbert
In this post: http://www.spinics.net/lists/linux-scsi/msg97124.html
the author shows some kernel infrastructure complaining about a
sleep in an invalid context. For calls to fetch memory when
processing SCSI commands, reviewers often propose non GFP_ATOMIC
variants; reality dictates otherwise. Fix memory allocation for
response to REPORT LUNS command.

Signed-off-by: Douglas Gilbert 
---
 drivers/scsi/scsi_debug.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 0f9ba41..b85c5dc 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -3331,13 +3331,12 @@ static int resp_report_luns(struct scsi_cmnd *scp,
tlun_cnt = lun_cnt + wlun_cnt;
 
rlen = (tlun_cnt * sizeof(struct scsi_lun)) + 8;
-   arr = vmalloc(rlen);
+   arr = kzalloc(rlen, GFP_ATOMIC);
if (!arr) {
mk_sense_buffer(scp, ILLEGAL_REQUEST, INSUFF_RES_ASC,
INSUFF_RES_ASCQ);
return check_condition_result;
}
-   memset(arr, 0, rlen);
pr_debug("select_report %d luns = %d wluns = %d no_lun0 %d\n",
 select_report, lun_cnt, wlun_cnt, sdebug_no_lun_0);
 
@@ -3355,7 +3354,7 @@ static int resp_report_luns(struct scsi_cmnd *scp,
put_unaligned_be32(rlen - 8, &arr[0]);
 
res = fill_from_dev_buffer(scp, arr, rlen);
-   vfree(arr);
+   kfree(arr);
return res;
 }
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_debug: fix sleep in invalid context

2016-05-30 Thread James Bottomley
On Mon, 2016-05-30 at 14:19 -0400, Douglas Gilbert wrote:
> In this post: http://www.spinics.net/lists/linux-scsi/msg97124.html
> the author shows some kernel infrastructure complaining about a
> sleep in an invalid context. For calls to fetch memory when
> processing SCSI commands, reviewers often propose non GFP_ATOMIC
> variants; reality dictates otherwise. Fix memory allocation for
> response to REPORT LUNS command.
> 
> Signed-off-by: Douglas Gilbert 
> ---
>  drivers/scsi/scsi_debug.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index 0f9ba41..b85c5dc 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -3331,13 +3331,12 @@ static int resp_report_luns(struct scsi_cmnd
> *scp,
>   tlun_cnt = lun_cnt + wlun_cnt;
>  
>   rlen = (tlun_cnt * sizeof(struct scsi_lun)) + 8;
> - arr = vmalloc(rlen);
> + arr = kzalloc(rlen, GFP_ATOMIC);
>   if (!arr) {
>   mk_sense_buffer(scp, ILLEGAL_REQUEST,
> INSUFF_RES_ASC,
>   INSUFF_RES_ASCQ);
>   return check_condition_result;
>   }
> - memset(arr, 0, rlen);
>   pr_debug("select_report %d luns = %d wluns = %d no_lun0
> %d\n",
>select_report, lun_cnt, wlun_cnt, sdebug_no_lun_0);
>  
> @@ -3355,7 +3354,7 @@ static int resp_report_luns(struct scsi_cmnd
> *scp,
>   put_unaligned_be32(rlen - 8, &arr[0]);
>  
>   res = fill_from_dev_buffer(scp, arr, rlen);
> - vfree(arr);
> + kfree(arr);

This might fix the immediate warning, but won't it demand huge
contiguous memory chunks in high lun configurations and thus fail
randomly?  Report luns is important to us because if that fails the
target won't attach.

What about vmalloc'ing enough space at configuration time, when you do
have process context, and simply reusing the already allocated buffer
in this routine?  If you want to be clever, you could do a single
vmalloc for the biggest LUN size you have and reuse that buffer for
every report lun command with suitable locking ... we tend to fire off
report luns sequentially at start of day, so it's not like they have
huge performance or concurrency requirements.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_debug: fix sleep in invalid context

2016-05-30 Thread Christoph Hellwig
On Mon, May 30, 2016 at 12:02:45PM -0700, James Bottomley wrote:
> This might fix the immediate warning, but won't it demand huge
> contiguous memory chunks in high lun configurations and thus fail
> randomly?  Report luns is important to us because if that fails the
> target won't attach.
> 
> What about vmalloc'ing enough space at configuration time, when you do
> have process context, and simply reusing the already allocated buffer
> in this routine?  If you want to be clever, you could do a single
> vmalloc for the biggest LUN size you have and reuse that buffer for
> every report lun command with suitable locking ... we tend to fire off
> report luns sequentially at start of day, so it's not like they have
> huge performance or concurrency requirements.

There is no need for the allocation at all.  Instead of the big
array and a single call to fill_from_dev_buffer we can simply
have a single scsi_lun structure on stack that gets reused for
every lun and individual calls to sg_copy_from_buffer for each
one instead of a single big one.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: fix race between simultaneous decrements of ->host_failed

2016-05-30 Thread Christoph Hellwig
On Mon, May 30, 2016 at 03:43:43PM +0800, Wei Fang wrote:
> I looked around the file, and didn't find the part should be updated.
> Would you point me out?

Lines 255 and 266 in Documentation/scsi/scsi_eh.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unexpected sync delays in dpkg for small pre-allocated files on ext4

2016-05-30 Thread Dave Chinner
On Mon, May 30, 2016 at 10:27:52AM +0200, Gernot Hillier wrote:
> Hi!
> 
> On 25.05.2016 01:13, Theodore Ts'o wrote:
> > On Tue, May 24, 2016 at 07:07:41PM +0200, Gernot Hillier wrote:
> >> We experience strange delays with kernel 4.1.18 during dpkg
> >> package installation on an ext4 filesystem after switching from
> >> Ubuntu 14.04 to 16.04. We can reproduce the issue with kernel 4.6.
> >> Installation of the same package takes 2s with ext3 and 31s with
> >> ext4 on the same partition.
> >>
> >> Hardware is an Intel-based server with Supermicro X8DTH board and
> >> Seagate ST973451SS disks connected to an LSI SAS2008 controller (PCI
> >> 0x1000:0x0072, mpt2sas driver).
> [...]
> >> To me, the problem looks comparable to
> >> https://bugzilla.kernel.org/show_bug.cgi?id=56821 (even if we don't see
> >> a full hang and there's no RAID involved for us), so a closer look on
> >> the SCSI layer or driver might be the next step?
> > 
> > What I would suggest is to create a small test case which compares the
> > time it takes to allocate 1 megabyte of memory, zero it, and then
> > write one megabytes of zeros using the write(2) system call.  Then try
> > writing one megabytes of zero using the BLKZEROOUT ioctl.
> 
> Ok, this is my test code:
> 
>   const int SIZE = 1*1024*1024;
>   char* buffer = malloc(SIZE);
>   uint64_t range[2] = { 0, SIZE };
>   int fd = open("/dev/sdb2", O_WRONLY);
> 
>   bzero(buffer, SIZE);
>   write(fd, buffer, SIZE);
>   sync_file_range(fd, 0, 0, 2);
> 
>   ioctl (fd, BLKZEROOUT, range);
> 
>   close(fd);
>   free(buffer);
> 
> # strace -tt ./test-tytso
> [...]
> 15:46:27.481636 open("/dev/sdb2", O_WRONLY) = 3
> 15:46:27.482004 write(3, "\0\0\0\0\0\0"..., 1048576) = 1048576
> 15:46:27.482438 sync_file_range(3, 0, 0, SYNC_FILE_RANGE_WRITE) = 0
> 15:46:27.482698 ioctl(3, BLKZEROOUT, [0, 10]) = 0
> 15:46:27.546971 close(3)= 0
> 
> So the write() and sync_file_range() in the first case takes ~400 us
> each while BLKZEROOUT takes... 60 ms. Wow.

Comparing apples to oranges.

Unlike the name implies, sync_file_range() does not provide any data
integrity semantics what-so-ever: SYNC_FILE_RANGE_WRITE only submits
IO to clean dirty pages - that only takes 400us of CPU time.  It
does not wait for completion, nor does it flush the drive cache and
so by the time the syscall returns to userspace the IO may not have
even been sent to the device (e.g. it could be queued by the IO
scheduler in the block layer). i.e. you're not timing IO, you're
timing CPU overhead of IO submission.

For an apples to apples comparison, you need to use fsync() to
physically force the written data to stable storage and wait for
completion. This is what BLKZEROOUT is effectively doing, so I think
you'll find fdatasync() also takes around 60ms...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html