Re: [PATCH v2 hulk-4.1-next 2/7] 9p: ->evict_inode() should kick out ->i_data, not, ->i_mapping

2018-01-18 Thread jiangyiwen
On 2018/1/19 11:46, jiangyiwen wrote:
> mainline inclusion
> from v4.4-rc5
> commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3
> category: bugfix
> bugzilla: 4404
> DTS: NA
> CVE: NA
> 补丁作用:对于块设备文件而言,->i_mapping与
> 普通文件不相同,如果文件中的数据来自于块设备
> 文件,则其->i_mapping将指向的是bdevfs inode
> (master inode)的i_data,而->i_data实际是
> empty的,因此在evict_inode的时候不应该释放
> ->i_mapping(bdevfs_inode->i_data)的所有
> pagecache;同时->i_mapping只有在文件被打开
> 的时候才能访问,此时已经是文件被关闭时不应
> 该访问该变量。
> 
> ---
> [ Upstream commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3 ]
> 
> For block devices the pagecache is associated with the inode
> on bdevfs, not with the aliasing ones on the mountable filesystems.
> The latter have its own ->i_data empty and ->i_mapping pointing
> to the (unique per major/minor) bdevfs inode.  That guarantees
> cache coherence between all block device inodes with the same
> device number.
> 
> Eviction of an alias inode has no business trying to evict the
> pages belonging to bdevfs one; moreover, ->i_mapping is only
> safe to access when the thing is opened.  At the time of
> ->evict_inode() the victim is definitely *not* opened.  We are
> about to kill the address space embedded into struct inode
> (inode->i_data) and that's what we need to empty of any pages.
> 
> 9p instance tries to empty inode->i_mapping instead, which is
> both unsafe and bogus - if we have several device nodes with
> the same device number in different places, closing one of them
> should not try to empty the (shared) page cache.
> 
> Fortunately, other instances in the tree are OK; they are
> evicting from >i_data instead, as 9p one should.
> 
> Cc: sta...@vger.kernel.org # v2.6.32+, ones prior to 2.6.36 need only half of 
> that
> Reported-by: "Suzuki K. Poulose" <suzuki.poul...@arm.com>
> Tested-by: "Suzuki K. Poulose" <suzuki.poul...@arm.com>
> Signed-off-by: Al Viro <v...@zeniv.linux.org.uk>
> Signed-off-by: Yiwen Jiang <jiangyi...@huawei.com>
> ---
>  fs/9p/vfs_inode.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 3d1f365..9b96add 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -451,9 +451,9 @@ void v9fs_evict_inode(struct inode *inode)
>  {
>   struct v9fs_inode *v9inode = V9FS_I(inode);
> 
> - truncate_inode_pages_final(inode->i_mapping);
> + truncate_inode_pages_final(>i_data);
>   clear_inode(inode);
> - filemap_fdatawrite(inode->i_mapping);
> + filemap_fdatawrite(>i_data);
> 
>   v9fs_cache_inode_put_cookie(inode);
>   /* clunk the fid stashed in writeback_fid */
> 
Hi all,

Sorry, It's my fault, please ignore this email.

Thanks,
Yiwen.



[PATCH v2 hulk-4.1-next 2/7] 9p: ->evict_inode() should kick out ->i_data, not, ->i_mapping

2018-01-18 Thread jiangyiwen
mainline inclusion
from v4.4-rc5
commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3
category: bugfix
bugzilla: 4404
DTS: NA
CVE: NA
补丁作用:对于块设备文件而言,->i_mapping与
普通文件不相同,如果文件中的数据来自于块设备
文件,则其->i_mapping将指向的是bdevfs inode
(master inode)的i_data,而->i_data实际是
empty的,因此在evict_inode的时候不应该释放
->i_mapping(bdevfs_inode->i_data)的所有
pagecache;同时->i_mapping只有在文件被打开
的时候才能访问,此时已经是文件被关闭时不应
该访问该变量。

---
[ Upstream commit 4ad78628445d26e5e9487b2e8f23274ad7b0f5d3 ]

For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own ->i_data empty and ->i_mapping pointing
to the (unique per major/minor) bdevfs inode.  That guarantees
cache coherence between all block device inodes with the same
device number.

Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, ->i_mapping is only
safe to access when the thing is opened.  At the time of
->evict_inode() the victim is definitely *not* opened.  We are
about to kill the address space embedded into struct inode
(inode->i_data) and that's what we need to empty of any pages.

9p instance tries to empty inode->i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.

Fortunately, other instances in the tree are OK; they are
evicting from >i_data instead, as 9p one should.

Cc: sta...@vger.kernel.org # v2.6.32+, ones prior to 2.6.36 need only half of 
that
Reported-by: "Suzuki K. Poulose" 
Tested-by: "Suzuki K. Poulose" 
Signed-off-by: Al Viro 
Signed-off-by: Yiwen Jiang 
---
 fs/9p/vfs_inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 3d1f365..9b96add 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -451,9 +451,9 @@ void v9fs_evict_inode(struct inode *inode)
 {
struct v9fs_inode *v9inode = V9FS_I(inode);

-   truncate_inode_pages_final(inode->i_mapping);
+   truncate_inode_pages_final(>i_data);
clear_inode(inode);
-   filemap_fdatawrite(inode->i_mapping);
+   filemap_fdatawrite(>i_data);

v9fs_cache_inode_put_cookie(inode);
/* clunk the fid stashed in writeback_fid */
-- 
1.8.3.1




scsi: Adding lock to protect variables of bit-field in struct scsi_device

2016-12-04 Thread jiangyiwen
Hi guys,

I'm sorry if someone else has already asked the same question before,
but here's what we are facing with scsi mid-level.

Variables of bit-field in struct scsi_device are not protected by
lock, such as no_report_opcodes, is_visible and so on. I guess
everyone think these variable can't be accessed by multi-processes,
but unfortunately, I came across one problem as follows:

Assuming there is a process A which is executing the operation of
adding device, so it will call scsi_sysfs_add_sdev, and eventually
it will start an async kworker B to execute sd_probe_async.
In this way, process A will set "sdev->is_visible = 1" in the function
of scsi_sysfs_add_sdev, in the meantime, aysnc kworker B may set
"sdev->no_report_opcodes = 1" in the function of sd_read_write_same.
Neither of the variables are protected by any lock, setting the value
of no_report_opcodes may change value of is_visible from 1 to 0, which
alters the bit field unexpectedly. Did I miss anything?

So I'd like to know whether these variable of bit-field should
be protected by lock or not.

Please advise. Thanks in advance!

Best regards,
Yiwen

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled

2016-02-16 Thread jiangyiwen
On 2016/2/5 11:13, Martin K. Petersen wrote:
>>>>>> "Yiwen" == jiangyiwen  <jiangyi...@huawei.com> writes:
> 
> Yiwen,
> 
> Yiwen> First, I don't understand why blk_peek_request() return
> Yiwen> EREMOTEIO, as I know, in this situation we only prepare scsi
> Yiwen> command without sending to device, and I think EREMOTEIO should
> Yiwen> be returned only when IO has already sent to device, maybe I
> Yiwen> don't understand definition of EREMOTEIO.  So, Why don't return
> Yiwen> the errno with EOPNOTSUPP?
> 
> DM currently has special handling for EREMOTEIO failures (because that's
> what we'd return when a device responds with ILLEGAL REQUEST).
> 
> I am not opposed to returning EOPNOTSUPP but it would require more
> changes and since this is a bugfix for stable I want to keep it as small
> as possible.
> 
> Yiwen> In addition, I still worried with whether there has other
> Yiwen> situations which will return EIO or other error. In this way,
> Yiwen> MD/DM still can happen this type of problem, so I think may be in
> Yiwen> multipath we still needs a protection to avoid it.
> 
> There are various error scenarios where we can end up bailing with a
> BLKPREP_KILL. But the general rule of thumb is that these conditions all
> demand a retry. The optional commands like WRITE SAME and UNMAP are
> special in that they are irrecoverable.
> 
> Yiwen> At last, I have a additional problem, I remember that you
> Yiwen> previously send a series of patches about XCOPY, why don't have
> Yiwen> any news latter later? I very much expect that I can see these
> Yiwen> patches which are merged into kernel.
> 
> I am working on a refresh of the series that includes token-based copy
> offload support in addition to EXTENDED COPY. The patches depend on Mike
> Christie's request flag patch series which has yet to be merged.
> 
Hi Martin,
I tested this code, but found a problem.

When called scsi_prep_fn return BLKPREP_INVALID, we should
use the same code with BLKPREP_KILL in scsi_prep_return. This
patch should add a line of code as follows:

---
 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd8ad2a..d8d2198 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1343,6 +1343,7 @@ scsi_prep_return(struct request_queue *q, struct request 
*req, int ret)

switch (ret) {
case BLKPREP_KILL:
+   case BLKPREP_INVALID:
req->errors = DID_NO_CONNECT << 16;
/* release the command and kill it */
if (req->special) {
-- 
1.8.4.3

Thanks,
Yiwen Jiang.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] block: do not return -EOPNOTSUPP only when issue a discard request

2016-02-04 Thread jiangyiwen
commit 8af1954d172a("blkdev: Do not return -EOPNOTSUPP if discard
is supported") only solve the situation of discard, because When
applications issue a discard request to device, they can't expect
deterministic behaviour. However, WRITE SAME should not ignore error
with EOPNOTSUPP, because if applications issue WRITE SAME requests to
device, it should return deterministic results to applications
according to the T10 standard, or else it will cause an inconsistent
state between upper layer and bottom layer.

So ignoring error with EOPNOTSUPP only in discard situation.

Signed-off-by: Yiwen Jiang 
Reviewed-by: Lukas Czerner 
Reviewed-by: Martin K. Petersen 
---
 block/blk-lib.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..e4f02f1 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -19,7 +19,9 @@ static void bio_batch_end_io(struct bio *bio)
 {
struct bio_batch *bb = bio->bi_private;

-   if (bio->bi_error && bio->bi_error != -EOPNOTSUPP)
+   /* ignore -EOPNOTSUPP only when issue a discard request */
+   if (bio->bi_error && (!(bio->bi_rw & REQ_DISCARD) ||
+   (bio->bi_error != -EOPNOTSUPP)))
bb->error = bio->bi_error;
if (atomic_dec_and_test(>done))
complete(bb->wait);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled

2016-02-04 Thread jiangyiwen
On 2016/2/5 11:13, Martin K. Petersen wrote:
>>>>>> "Yiwen" == jiangyiwen  <jiangyi...@huawei.com> writes:
> 
> Yiwen,
> 
> Yiwen> First, I don't understand why blk_peek_request() return
> Yiwen> EREMOTEIO, as I know, in this situation we only prepare scsi
> Yiwen> command without sending to device, and I think EREMOTEIO should
> Yiwen> be returned only when IO has already sent to device, maybe I
> Yiwen> don't understand definition of EREMOTEIO.  So, Why don't return
> Yiwen> the errno with EOPNOTSUPP?
> 
> DM currently has special handling for EREMOTEIO failures (because that's
> what we'd return when a device responds with ILLEGAL REQUEST).
> 
> I am not opposed to returning EOPNOTSUPP but it would require more
> changes and since this is a bugfix for stable I want to keep it as small
> as possible.
> 
> Yiwen> In addition, I still worried with whether there has other
> Yiwen> situations which will return EIO or other error. In this way,
> Yiwen> MD/DM still can happen this type of problem, so I think may be in
> Yiwen> multipath we still needs a protection to avoid it.
> 
> There are various error scenarios where we can end up bailing with a
> BLKPREP_KILL. But the general rule of thumb is that these conditions all
> demand a retry. The optional commands like WRITE SAME and UNMAP are
> special in that they are irrecoverable.
> 
> Yiwen> At last, I have a additional problem, I remember that you
> Yiwen> previously send a series of patches about XCOPY, why don't have
> Yiwen> any news latter later? I very much expect that I can see these
> Yiwen> patches which are merged into kernel.
> 
> I am working on a refresh of the series that includes token-based copy
> offload support in addition to EXTENDED COPY. The patches depend on Mike
> Christie's request flag patch series which has yet to be merged.
> 
Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com>

Thanks,
Yiwen Jiang.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled

2016-02-04 Thread jiangyiwen
On 2016/2/4 14:48, Martin K. Petersen wrote:
> When a storage device rejects a WRITE SAME command we will disable write
> same functionality for the device and return -EREMOTEIO to the block
> layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
> failing the path.
> 
> Yiwen Jiang discovered a small race where WRITE SAME requests issued
> simultaneously would cause -EIO to be returned. This happened because
> any requests being prepared after WRITE SAME had been disabled for the
> device caused us to return BLKPREP_KILL. The latter caused the block
> layer to return -EIO upon completion.
> 
> To overcome this we introduce BLKPREP_INVALID which indicates that this
> is an invalid request for the device. blk_peek_request() is modified to
> return -EREMOTEIO in that case.
> 
> Reported-by: Yiwen Jiang 
> Suggested-by: Mike Snitzer 
> Signed-off-by: Martin K. Petersen 
> 
> ---
> 
> I contemplated making blk_peek_request() use rq->errors to decide what
> to return up the stack. However, I cringed at mixing errnos and SCSI
> midlayer status information in the same location.
> ---
>  block/blk-core.c   | 6 --
>  drivers/scsi/sd.c  | 4 ++--
>  include/linux/blkdev.h | 9 ++---
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 33e2f62d5062..ee833af2f892 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2446,14 +2446,16 @@ struct request *blk_peek_request(struct request_queue 
> *q)
>  
>   rq = NULL;
>   break;
> - } else if (ret == BLKPREP_KILL) {
> + } else if (ret == BLKPREP_KILL || ret == BLKPREP_INVALID) {
> + int err = ret == BLKPREP_INVALID ? -EREMOTEIO : -EIO;
> +
>   rq->cmd_flags |= REQ_QUIET;
>   /*
>* Mark this request as started so we don't trigger
>* any debug logic in the end I/O path.
>*/
>   blk_start_request(rq);
> - __blk_end_request_all(rq, -EIO);
> + __blk_end_request_all(rq, err);
>   } else {
>   printk(KERN_ERR "%s: bad return=%d\n", __func__, ret);
>   break;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ec163d08f6c3..6e841c6da632 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -761,7 +761,7 @@ static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
>   break;
>  
>   default:
> - ret = BLKPREP_KILL;
> + ret = BLKPREP_INVALID;
>   goto out;
>   }
>  
> @@ -839,7 +839,7 @@ static int sd_setup_write_same_cmnd(struct scsi_cmnd *cmd)
>   int ret;
>  
>   if (sdkp->device->no_write_same)
> - return BLKPREP_KILL;
> + return BLKPREP_INVALID;
>  
>   BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c70e3588a48c..e990d181625a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -680,9 +680,12 @@ static inline bool blk_write_same_mergeable(struct bio 
> *a, struct bio *b)
>  /*
>   * q->prep_rq_fn return values
>   */
> -#define BLKPREP_OK   0   /* serve it */
> -#define BLKPREP_KILL 1   /* fatal error, kill */
> -#define BLKPREP_DEFER2   /* leave on queue */
> +enum {
> + BLKPREP_OK, /* serve it */
> + BLKPREP_KILL,   /* fatal error, kill, return -EIO */
> + BLKPREP_INVALID,/* invalid command, kill, return -EREMOTEIO */
> + BLKPREP_DEFER,  /* leave on queue */
> +};
>  
>  extern unsigned long blk_max_low_pfn, blk_max_pfn;
>  
> 
Hi Martin,
It is very good, I totally agree with this patch. But I have
three questions:

First, I don't understand why blk_peek_request() return EREMOTEIO,
as I know, in this situation we only prepare scsi command
without sending to device, and I think EREMOTEIO should
be returned only when IO has already sent to device, maybe
I don't understand definition of EREMOTEIO.
So, Why don't return the errno with EOPNOTSUPP?

In addition, I still worried with whether there has other
situations which will return EIO or other error. In this
way, MD/DM still can happen this type of problem, so I think
may be in multipath we still needs a protection to avoid it.

At last, I have a additional problem, I remember that you
previously send a series of patches about XCOPY, why don't
have any news latter later? I very much expect that I can
see these patches which are merged into kernel.

Thanks,
Yiwen Jiang.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: dm-mpath: fix a tiny case which can cause an infinite loop

2016-02-04 Thread jiangyiwen
On 2016/2/4 12:25, Mike Snitzer wrote:
> On Wed, Feb 03 2016 at 10:49pm -0500,
> jiangyiwen <jiangyi...@huawei.com> wrote:
> 
>> On 2016/2/4 11:24, Mike Snitzer wrote:
>>> On Wed, Feb 03 2016 at  9:08pm -0500,
>>> jiangyiwen <jiangyi...@huawei.com> wrote:
>>>
>>>> When two processes submit WRTIE SAME bio simultaneously and
>>>> first IO return failed because of INVALID FIELD IN CDB, and
>>>> then second IO can enter into an infinite loop.
>>>> The problem can be described as follows:
>>>>
>>>> process 1  process 2
>>>> submit_bio(REQ_WRITE_SAME) and
>>>> wait io completion
>>>>begin submit_bio(REQ_WRITE_SAME)
>>>>-> blk_queue_bio()
>>>>  -> dm_request_fn()
>>>>-> map_request()
>>>>  -> scsi_request_fn()
>>>>-> blk_peek_request()
>>>>  -> scsi_prep_fn()
>>>> In the moment, IO return and
>>>> sense_key with illegal_request,
>>>> sense_code with 0x24(INVALID FIELD IN CDB).
>>>> In this way it will set no_write_same = 1
>>>> in sd_done() and disable write same
>>>>In sd_setup_write_same_cmnd()
>>>>when find (no_write_same == 1)
>>>>will return BLKPREP_KILL,
>>>>and then in blk_peek_request()
>>>>set error to EIO.
>>>>However, in multipath_end_io()
>>>>when find error is EIO it will
>>>>fail path and retry even if
>>>>device doesn't support WRITE SAME.
>>>>
>>>> In this situation, when process of multipathd reinstate
>>>> path by using test_unit_ready periodically, it will cause
>>>> second WRITE SAME IO enters into an infinite loop and
>>>> loop never terminates.
>>>>
>>>> In do_end_io(), when finding device doesn't support WRITE SAME and
>>>> request is WRITE SAME, return EOPNOTSUPP to applications.
>>>>
>>>> Signed-off-by: Yiwen Jiang <jiangyi...@huawei.com>
>>>> Reviewed-by: Joseph Qi <joseph...@huawei.com>
>>>> Reviewed-by: xuejiufei <xuejiu...@huawei.com>
>>>> ---
>>>>  drivers/md/dm-mpath.c | 5 +
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>>>> index cfa29f5..ad1602a 100644
>>>> --- a/drivers/md/dm-mpath.c
>>>> +++ b/drivers/md/dm-mpath.c
>>>> @@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct 
>>>> request *clone,
>>>>if (noretry_error(error))
>>>>return error;
>>>>
>>>> +  /* do not retry in case of WRITE SAME not supporting */
>>>> +  if ((clone->cmd_flags & REQ_WRITE_SAME) &&
>>>> +  !clone->q->limits.max_write_same_sectors)
>>>> +  return -EOPNOTSUPP;
>>>> +
>>>>if (mpio->pgpath)
>>>>fail_path(mpio->pgpath);
>>>>
>>>
>>> Did you test this patch?  Looks like it isn't going to make a
>>> difference.  'error' will be -EREMOTEIO, which will be caught by
>>> noretry_error().  So we'll never go on to run your new code.
>>>
>>> Please see commit 7eee4ae2db ("dm: disable WRITE SAME if it fails").
>>>
>>> I think DM already handles this case properly.  The only thing is it'll
>>> return -EREMOTEIO (rather than -EOPNOTSUPP) further up the stack.
>>>
>>> .
>>>
>> Hi Mike,
>> Yes, I have already test in version linux-3.8, and I also have already
>> carefully checked latest kernel code, and find that it also exists this
>> problem. I know about this commit 7eee4ae2db ("dm: disable WRITE SAME
>> if it fails"), it only can solve first IO situation which I mentioned
>> above, in other word

[dm-devel] [PATCH] dm-mpath: fix a tiny case which can cause an infinite loop

2016-02-03 Thread jiangyiwen
When two processes submit WRTIE SAME bio simultaneously and
first IO return failed because of INVALID FIELD IN CDB, and
then second IO can enter into an infinite loop.
The problem can be described as follows:

process 1  process 2
submit_bio(REQ_WRITE_SAME) and
wait io completion
   begin submit_bio(REQ_WRITE_SAME)
   -> blk_queue_bio()
 -> dm_request_fn()
   -> map_request()
 -> scsi_request_fn()
   -> blk_peek_request()
 -> scsi_prep_fn()
In the moment, IO return and
sense_key with illegal_request,
sense_code with 0x24(INVALID FIELD IN CDB).
In this way it will set no_write_same = 1
in sd_done() and disable write same
   In sd_setup_write_same_cmnd()
   when find (no_write_same == 1)
   will return BLKPREP_KILL,
   and then in blk_peek_request()
   set error to EIO.
   However, in multipath_end_io()
   when find error is EIO it will
   fail path and retry even if
   device doesn't support WRITE SAME.

In this situation, when process of multipathd reinstate
path by using test_unit_ready periodically, it will cause
second WRITE SAME IO enters into an infinite loop and
loop never terminates.

In do_end_io(), when finding device doesn't support WRITE SAME and
request is WRITE SAME, return EOPNOTSUPP to applications.

Signed-off-by: Yiwen Jiang 
Reviewed-by: Joseph Qi 
Reviewed-by: xuejiufei 
---
 drivers/md/dm-mpath.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index cfa29f5..ad1602a 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct request 
*clone,
if (noretry_error(error))
return error;

+   /* do not retry in case of WRITE SAME not supporting */
+   if ((clone->cmd_flags & REQ_WRITE_SAME) &&
+   !clone->q->limits.max_write_same_sectors)
+   return -EOPNOTSUPP;
+
if (mpio->pgpath)
fail_path(mpio->pgpath);

-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dm-mpath: fix a tiny case which can cause an infinite loop

2016-02-03 Thread jiangyiwen
On 2016/2/4 11:24, Mike Snitzer wrote:
> On Wed, Feb 03 2016 at  9:08pm -0500,
> jiangyiwen <jiangyi...@huawei.com> wrote:
> 
>> When two processes submit WRTIE SAME bio simultaneously and
>> first IO return failed because of INVALID FIELD IN CDB, and
>> then second IO can enter into an infinite loop.
>> The problem can be described as follows:
>>
>> process 1  process 2
>> submit_bio(REQ_WRITE_SAME) and
>> wait io completion
>>begin submit_bio(REQ_WRITE_SAME)
>>-> blk_queue_bio()
>>  -> dm_request_fn()
>>-> map_request()
>>  -> scsi_request_fn()
>>-> blk_peek_request()
>>  -> scsi_prep_fn()
>> In the moment, IO return and
>> sense_key with illegal_request,
>> sense_code with 0x24(INVALID FIELD IN CDB).
>> In this way it will set no_write_same = 1
>> in sd_done() and disable write same
>>In sd_setup_write_same_cmnd()
>>when find (no_write_same == 1)
>>will return BLKPREP_KILL,
>>and then in blk_peek_request()
>>set error to EIO.
>>However, in multipath_end_io()
>>when find error is EIO it will
>>fail path and retry even if
>>device doesn't support WRITE SAME.
>>
>> In this situation, when process of multipathd reinstate
>> path by using test_unit_ready periodically, it will cause
>> second WRITE SAME IO enters into an infinite loop and
>> loop never terminates.
>>
>> In do_end_io(), when finding device doesn't support WRITE SAME and
>> request is WRITE SAME, return EOPNOTSUPP to applications.
>>
>> Signed-off-by: Yiwen Jiang <jiangyi...@huawei.com>
>> Reviewed-by: Joseph Qi <joseph...@huawei.com>
>> Reviewed-by: xuejiufei <xuejiu...@huawei.com>
>> ---
>>  drivers/md/dm-mpath.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
>> index cfa29f5..ad1602a 100644
>> --- a/drivers/md/dm-mpath.c
>> +++ b/drivers/md/dm-mpath.c
>> @@ -1269,6 +1269,11 @@ static int do_end_io(struct multipath *m, struct 
>> request *clone,
>>  if (noretry_error(error))
>>  return error;
>>
>> +/* do not retry in case of WRITE SAME not supporting */
>> +if ((clone->cmd_flags & REQ_WRITE_SAME) &&
>> +!clone->q->limits.max_write_same_sectors)
>> +return -EOPNOTSUPP;
>> +
>>  if (mpio->pgpath)
>>  fail_path(mpio->pgpath);
>>
> 
> Did you test this patch?  Looks like it isn't going to make a
> difference.  'error' will be -EREMOTEIO, which will be caught by
> noretry_error().  So we'll never go on to run your new code.
> 
> Please see commit 7eee4ae2db ("dm: disable WRITE SAME if it fails").
> 
> I think DM already handles this case properly.  The only thing is it'll
> return -EREMOTEIO (rather than -EOPNOTSUPP) further up the stack.
> 
> .
> 
Hi Mike,
Yes, I have already test in version linux-3.8, and I also have already
carefully checked latest kernel code, and find that it also exists this
problem. I know about this commit 7eee4ae2db ("dm: disable WRITE SAME
if it fails"), it only can solve first IO situation which I mentioned
above, in other words, when WRITE SAME IO truly send to device, it
actually return -EREMOTEIO if device doesn't support WRITE SAME.
But in above situation which issues two WRITE SAME IO simultaneously,
second IO will not truly send to device instead of returning
BLKPREP_KILL in sd_setup_write_same_cmnd() because find
(no_write_same == 1) which no_write_same is set when first IO returned,
and then in blk_peek_request() will return EIO to MD/DM which caused
the problem above mentioned.

I have described detailed process of problem, you will find actually
it is a problem when reviewed carefully.

Thanks,
Yiwen Jiang


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html