RE: [PATCH] [v3] blk-mq-tag: make blk_mq_tag_busy() return void

2020-12-15 Thread Tianxianting
Hi jens
Could I get your commits?  thanks

-Original Message-
From: tianxianting (RD) 
Sent: Thursday, December 10, 2020 2:05 PM
To: ax...@kernel.dk
Cc: ming@redhat.com; linux-bl...@vger.kernel.org; 
linux-kernel@vger.kernel.org; tianxianting (RD) 
Subject: [PATCH] [v3] blk-mq-tag: make blk_mq_tag_busy() return void

As no one cares about the return value of blk_mq_tag_busy() and 
__blk_mq_tag_busy(), so make them return void.

Other change is to simplify blk_mq_tag_idle().

Signed-off-by: Xianting Tian 
Reviewed-by: Ming Lei 
---
 block/blk-mq-tag.c |  4 +---
 block/blk-mq-tag.h | 16 ++--
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 9c92053e7..01c0bb1fb 
100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -21,7 +21,7 @@
  * to get tag when first time, the other shared-tag users could reserve
  * budget for it.
  */
-bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
+void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
struct request_queue *q = hctx->queue; @@ -35,8 +35,6 @@ bool 
__blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
!test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, >state))
atomic_inc(>tags->active_queues);
}
-
-   return true;
 }
 
 /*
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 7d3e6b333..4b4ccd794 
100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -60,23 +60,19 @@ enum {
BLK_MQ_TAG_MAX  = BLK_MQ_NO_TAG - 1,
 };
 
-extern bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *);
+extern void __blk_mq_tag_busy(struct blk_mq_hw_ctx *);
 extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
 
-static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
+static inline void blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-   if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
-   return false;
-
-   return __blk_mq_tag_busy(hctx);
+   if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
+   __blk_mq_tag_busy(hctx);
 }
 
 static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)  {
-   if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
-   return;
-
-   __blk_mq_tag_idle(hctx);
+   if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
+   __blk_mq_tag_idle(hctx);
 }
 
 static inline bool blk_mq_tag_is_reserved(struct blk_mq_tags *tags,
--
2.17.1



RE: [PATCH] [v2] tty: Protect disc_data in n_tty_close and n_tty_flush_buffer

2020-12-14 Thread Tianxianting
Hi Greg KH
Could we get your comments for the updates?  Thanks 

-Original Message-
From: gaoyan (RD) 
Sent: Friday, December 11, 2020 2:09 PM
To: Greg KH 
Cc: jirisl...@kernel.org; linux-kernel@vger.kernel.org; tianxianting (RD) 

Subject: 答复: [PATCH] [v2] tty: Protect disc_data in n_tty_close and 
n_tty_flush_buffer

Hi Greg KH:
I try to reproduce this problem in testing, but it is difficult to 
happen again. It is hard to grasp the timing that n_tty_flush_buffer accesses 
the disc_data which was just set to NULL by n_tty_close.

Thanks
Gao Yan

-邮件原件-
发件人: Greg KH [mailto:gre...@linuxfoundation.org]
发送时间: 2020年12月10日 14:22
收件人: gaoyan (RD) 
抄送: jirisl...@kernel.org; linux-kernel@vger.kernel.org; tianxianting (RD) 

主题: Re: [PATCH] [v2] tty: Protect disc_data in n_tty_close and 
n_tty_flush_buffer

On Thu, Dec 10, 2020 at 10:25:07AM +0800, Yan.Gao wrote:
> n_tty_flush_buffer can happen in parallel with n_tty_close that the
> tty->disc_data will be set to NULL. n_tty_flush_buffer accesses 
> tty->disc_data, so we must prevent n_tty_close clear tty->disc_data
> while n_tty_flush_buffer  has a non-NULL view of tty->disc_data.
> 
> So we need to make sure that accesses to disc_data are atomic using
> tty->termios_rwsem.
> 
> There is an example I meet:
> When n_tty_flush_buffer accesses tty struct, the disc_data is right.
> However, then reset_buffer_flags accesses tty->disc_data, disc_data 
> become NULL, So kernel crash when accesses tty->disc_data->real_tail.
> I guess there could be another thread change tty->disc_data to NULL, 
> and during N_TTY line discipline, n_tty_close will set tty->disc_data 
> to be NULL. So use tty->termios_rwsem to protect disc_data between 
> close and flush_buffer.
> 
> IP: reset_buffer_flags+0x9/0xf0
> PGD 0 P4D 0
> Oops: 0002 [#1] SMP
> CPU: 23 PID: 2087626 Comm: (agetty) Kdump: loaded Tainted: G Hardware
> name: UNISINSIGHT X3036P-G3/ST01M2C7S, BIOS 2.00.13 01/11/2019
> task: 9c4e9da71e80 task.stack: b30cfe898000
> RIP: 0010:reset_buffer_flags+0x9/0xf0
> RSP: 0018:b30cfe89bca8 EFLAGS: 00010246
> RAX: 9c4e9da71e80 RBX: 9c368d1bac00 RCX: 
> RDX:  RSI: 9c4ea17b50f0 RDI: 
> RBP: b30cfe89bcc8 R08: 0100 R09: 0001
> R10: 0001 R11:  R12: 9c368d1bacc0
> R13: 9c20cfd18428 R14: 9c4ea17b50f0 R15: 9c368d1bac00
> FS:  7f9fbbe97940() GS:9c375c74()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2260 CR3: 002f72233003 CR4: 007606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
> ? n_tty_flush_buffer+0x2a/0x60
> tty_buffer_flush+0x76/0x90
> tty_ldisc_flush+0x22/0x40
> vt_ioctl+0x5a7/0x10b0
> ? n_tty_ioctl_helper+0x27/0x110
> tty_ioctl+0xef/0x8c0
> do_vfs_ioctl+0xa7/0x5e0
> ? __audit_syscall_entry+0xaf/0x100
> ? syscall_trace_enter+0x1d0/0x2b0
> SyS_ioctl+0x79/0x90
> do_syscall_64+0x6c/0x1b0
> entry_SYSCALL64_slow_path+0x25/0x25
> 
> n_tty_flush_buffer--->tty->disc_data is OK
>   ->reset_buffer_flags -->tty->disc_data is NULL
> 
> Signed-off-by: Yan.Gao 
> Reviewed-by: Xianting Tian 
> ---
>  drivers/tty/n_tty.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index
> 7e5e36315..e78124ce1 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1892,8 +1892,10 @@ static void n_tty_close(struct tty_struct *tty)
>   if (tty->link)
>   n_tty_packet_mode_flush(tty);
>  
> + down_write(>termios_rwsem);
>   vfree(ldata);
>   tty->disc_data = NULL;
> + up_write(>termios_rwsem);
>  }
>  
>  /**

So does this solve your problem in testing?  Do you have a reproducer for this 
problem?

thanks,

greg k-h


RE: [PATCH] [v2] blk-mq-tag: make blk_mq_tag_busy() return void

2020-12-09 Thread Tianxianting
Yes, 
Sorry, In V2. I missed it, so I sent v3 :)

-Original Message-
From: Chaitanya Kulkarni [mailto:chaitanya.kulka...@wdc.com] 
Sent: Thursday, December 10, 2020 2:21 PM
To: tianxianting (RD) ; ax...@kernel.dk
Cc: ming@redhat.com; linux-bl...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] blk-mq-tag: make blk_mq_tag_busy() return void

On 12/9/20 22:06, Xianting Tian wrote:
> As no one cares about the return value of blk_mq_tag_busy() and 
> __blk_mq_tag_busy(), so make them return void.
>
> Other change is to simplify blk_mq_tag_idle().
>
> Signed-off-by: Xianting Tian 
> Reviewed-by: Ming Lei 
> ---
>  block/blk-mq-tag.c |  4 ++--
>  block/blk-mq-tag.h | 16 ++--
>  2 files changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 
> 9c92053e7..21ff7d156 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -21,7 +21,7 @@
>   * to get tag when first time, the other shared-tag users could reserve
>   * budget for it.
>   */
> -bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> +void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  {
>   if (blk_mq_is_sbitmap_shared(hctx->flags)) {
>   struct request_queue *q = hctx->queue; @@ -36,7 +36,7 @@ bool 
> __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>   atomic_inc(>tags->active_queues);
>   }
>  
> - return true;
> + return;
if above return is the last statement then you need to remove that instead of 
using return with no value.

Also, please add the version history.


RE: [PATCH] blk-mq-tag: make blk_mq_tag_busy() return void

2020-12-09 Thread Tianxianting
Thanks for the comments,
So blk_mq_tag_idle() can be also simplified as below,  I will send v2 patch for 
reviewing.

static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
-   if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
-   return;
-
-   __blk_mq_tag_idle(hctx);
+   if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
+   __blk_mq_tag_idle(hctx);
 }

-Original Message-
From: Ming Lei [mailto:ming@redhat.com] 
Sent: Thursday, December 10, 2020 12:01 PM
To: tianxianting (RD) 
Cc: ax...@kernel.dk; linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] blk-mq-tag: make blk_mq_tag_busy() return void

On Tue, Dec 08, 2020 at 03:40:02PM +0800, Xianting Tian wrote:
> As no one cares about the return value of blk_mq_tag_busy() and 
> __blk_mq_tag_busy(), so make them return void.
> 
> Signed-off-by: Xianting Tian 
> ---
>  block/blk-mq-tag.c | 4 ++--
>  block/blk-mq-tag.h | 8 
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 
> 9c92053e7..21ff7d156 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -21,7 +21,7 @@
>   * to get tag when first time, the other shared-tag users could reserve
>   * budget for it.
>   */
> -bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> +void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  {
>   if (blk_mq_is_sbitmap_shared(hctx->flags)) {
>   struct request_queue *q = hctx->queue; @@ -36,7 +36,7 @@ bool 
> __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>   atomic_inc(>tags->active_queues);
>   }
>  
> - return true;
> + return;
>  }
>  
>  /*
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 
> 7d3e6b333..dd80e5a85 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -60,15 +60,15 @@ enum {
>   BLK_MQ_TAG_MAX  = BLK_MQ_NO_TAG - 1,
>  };
>  
> -extern bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *);
> +extern void __blk_mq_tag_busy(struct blk_mq_hw_ctx *);
>  extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
>  
> -static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> +static inline void blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  {
>   if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
> - return false;
> + return;
>  
> - return __blk_mq_tag_busy(hctx);
> + __blk_mq_tag_busy(hctx);

The above can be simplified as:

if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
__blk_mq_tag_busy(hctx);

Otherwise, looks fine:

Reviewed-by: Ming Lei 

Thanks,
Ming



RE: [PATCH] sched/rt: Print curr when RT throttling activated

2020-12-08 Thread Tianxianting
Thanks,
We met an issue that a normal thread can't get cpu, 
And at this moment, we found 'sched: RT throttling activated' log.

So I think this patch is useful for such issue.

Could I get more comments?  Thanks in advance
-Original Message-
From: Steven Rostedt [mailto:rost...@goodmis.org] 
Sent: Thursday, December 03, 2020 10:40 PM
To: tianxianting (RD) 
Cc: mi...@redhat.com; pet...@infradead.org; juri.le...@redhat.com; 
vincent.guit...@linaro.org; dietmar.eggem...@arm.com; bseg...@google.com; 
mgor...@suse.de; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched/rt: Print curr when RT throttling activated

On Thu, 3 Dec 2020 15:51:29 +0800
Xianting Tian  wrote:

> We may meet the issue, that one RT thread occupied the cpu by 
> 950ms/1s, The RT thread maybe is a business thread or other unknown thread.
> 
> Currently, it only outputs the print "sched: RT throttling activated"
> when RT throttling happen. It is hard to know what is the RT thread, 
> For further analysis, we need add more prints.
> 
> This patch is to print current RT task when RT throttling activated, 
> It help us to know what is the RT thread in the first time.

I think this can be useful information to include.

Acked-by: Steven Rostedt (VMware) 

-- Steve

> 
> Signed-off-by: Xianting Tian 
> ---
>  kernel/sched/rt.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 
> f215eea6a..8913f38cb 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -946,7 +946,7 @@ static inline int rt_se_prio(struct sched_rt_entity 
> *rt_se)
>   return rt_task_of(rt_se)->prio;
>  }
>  
> -static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
> +static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq, struct 
> +task_struct *curr)
>  {
>   u64 runtime = sched_rt_runtime(rt_rq);
>  
> @@ -970,7 +970,8 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
>*/
>   if (likely(rt_b->rt_runtime)) {
>   rt_rq->rt_throttled = 1;
> - printk_deferred_once("sched: RT throttling 
> activated\n");
> + printk_deferred_once("sched: RT throttling activated 
> (curr: pid %d, comm %s)\n",
> + curr->pid, curr->comm);
>   } else {
>   /*
>* In case we did anyway, make it go away, @@ -1026,7 
> +1027,7 @@ 
> static void update_curr_rt(struct rq *rq)
>   if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
>   raw_spin_lock(_rq->rt_runtime_lock);
>   rt_rq->rt_time += delta_exec;
> - if (sched_rt_runtime_exceeded(rt_rq))
> + if (sched_rt_runtime_exceeded(rt_rq, curr))
>   resched_curr(rq);
>   raw_spin_unlock(_rq->rt_runtime_lock);
>   }



RE: [PATCH] ext4: remove the null check of bio_vec page

2020-12-03 Thread Tianxianting
Thanks Ted :)

-Original Message-
From: Theodore Y. Ts'o [mailto:ty...@mit.edu] 
Sent: Thursday, December 03, 2020 10:11 PM
To: Jan Kara 
Cc: tianxianting (RD) ; adilger.ker...@dilger.ca; 
linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: remove the null check of bio_vec page

On Wed, Oct 21, 2020 at 12:25:03PM +0200, Jan Kara wrote:
> On Tue 20-10-20 16:22:01, Xianting Tian wrote:
> > bv_page can't be NULL in a valid bio_vec, so we can remove the NULL 
> > check, as we did in other places when calling 
> > bio_for_each_segment_all() to go through all bio_vec of a bio.
> > 
> > Signed-off-by: Xianting Tian 
> 
> Thanks for the patch. It looks good to me. You can add:
> 
> Reviewed-by: Jan Kara 

Applied, thanks.

- Ted


RE: [PATCH] ext4: remove the null check of bio_vec page

2020-12-03 Thread Tianxianting
Thanks Jan
Hi Ted Tso, Could I get your opinion of this patch?  thanks

-Original Message-
From: Jan Kara [mailto:j...@suse.cz] 
Sent: Saturday, October 24, 2020 12:57 AM
To: tianxianting (RD) 
Cc: Jan Kara ; ty...@mit.edu; adilger.ker...@dilger.ca; 
linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: remove the null check of bio_vec page

On Fri 23-10-20 16:38:16, Tianxianting wrote:
> Thanks Jan
> Can be the patch applied?

Ted Tso is the ext4 maintainer so he should eventually pick up and apply the 
patch. But since there's merge window currently open, I guess he's busy 
shuffling patches to send to Linus. I'd expect he'll get to your patch in a 
week or two.

Honza

> 
> -Original Message-
> From: Jan Kara [mailto:j...@suse.cz]
> Sent: Wednesday, October 21, 2020 6:25 PM
> To: tianxianting (RD) 
> Cc: ty...@mit.edu; adilger.ker...@dilger.ca; j...@suse.cz; 
> linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] ext4: remove the null check of bio_vec page
> 
> On Tue 20-10-20 16:22:01, Xianting Tian wrote:
> > bv_page can't be NULL in a valid bio_vec, so we can remove the NULL 
> > check, as we did in other places when calling
> > bio_for_each_segment_all() to go through all bio_vec of a bio.
> > 
> > Signed-off-by: Xianting Tian 
> 
> Thanks for the patch. It looks good to me. You can add:
> 
> Reviewed-by: Jan Kara 
> 
>   Honza
> 
> > ---
> >  fs/ext4/page-io.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index
> > defd2e10d..cb135a944 100644
> > --- a/fs/ext4/page-io.c
> > +++ b/fs/ext4/page-io.c
> > @@ -111,9 +111,6 @@ static void ext4_finish_bio(struct bio *bio)
> > unsigned under_io = 0;
> > unsigned long flags;
> >  
> > -   if (!page)
> > -   continue;
> > -
> > if (fscrypt_is_bounce_page(page)) {
> > bounce_page = page;
> > page = fscrypt_pagecache_page(bounce_page);
> > --
> > 2.17.1
> > 
> --
> Jan Kara 
> SUSE Labs, CR
--
Jan Kara 
SUSE Labs, CR


RE: [PATCH] mm: bio_alloc never fails when set GFP_NOIO, GFP_KERNEL

2020-10-23 Thread Tianxianting
Hi Andrew
Could I get your comments for this patch?  Thanks in advance.

-Original Message-
From: tianxianting (RD) 
Sent: Wednesday, October 21, 2020 11:11 AM
To: a...@linux-foundation.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; tianxianting (RD) 

Subject: [PATCH] mm: bio_alloc never fails when set GFP_NOIO, GFP_KERNEL

bio_alloc with __GFP_DIRECT_RECLAIM(which is included in GFP_NOIO,
GFP_KERNEL) never fails, as stated in the comments of bio_alloc_bioset.

So we can remove multiple unneeded null checks of bio_alloc and simplify the 
code.

We have done it in fs/ext4/readpage.c, fs/ext4/page-io.c, fs/direct-io.c, and 
so forth.

Signed-off-by: Xianting Tian 
---
 mm/page_io.c | 31 +++
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c index e485a6e8a..9215bb356 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -30,18 +30,20 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
struct page *page, bio_end_io_t end_io)  {
struct bio *bio;
+   struct block_device *bdev;
 
+   /*
+* bio_alloc will _always_ be able to allocate a bio if
+* __GFP_DIRECT_RECLAIM is set, see comments for bio_alloc_bioset().
+*/
bio = bio_alloc(gfp_flags, 1);
-   if (bio) {
-   struct block_device *bdev;
+   bio->bi_iter.bi_sector = map_swap_page(page, );
+   bio_set_dev(bio, bdev);
+   bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
+   bio->bi_end_io = end_io;
 
-   bio->bi_iter.bi_sector = map_swap_page(page, );
-   bio_set_dev(bio, bdev);
-   bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
-   bio->bi_end_io = end_io;
+   bio_add_page(bio, page, thp_size(page), 0);
 
-   bio_add_page(bio, page, thp_size(page), 0);
-   }
return bio;
 }
 
@@ -351,19 +353,13 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
 
ret = 0;
bio = get_swap_bio(GFP_NOIO, page, end_write_func);
-   if (bio == NULL) {
-   set_page_dirty(page);
-   unlock_page(page);
-   ret = -ENOMEM;
-   goto out;
-   }
bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc);
bio_associate_blkg_from_page(bio, page);
count_swpout_vm_event(page);
set_page_writeback(page);
unlock_page(page);
submit_bio(bio);
-out:
+
return ret;
 }
 
@@ -416,11 +412,6 @@ int swap_readpage(struct page *page, bool synchronous)
 
ret = 0;
bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
-   if (bio == NULL) {
-   unlock_page(page);
-   ret = -ENOMEM;
-   goto out;
-   }
disk = bio->bi_disk;
/*
 * Keep this task valid during swap readpage because the oom killer may
--
2.17.1



RE: [PATCH] ext4: remove the null check of bio_vec page

2020-10-23 Thread Tianxianting
Thanks Jan
Can be the patch applied?

-Original Message-
From: Jan Kara [mailto:j...@suse.cz] 
Sent: Wednesday, October 21, 2020 6:25 PM
To: tianxianting (RD) 
Cc: ty...@mit.edu; adilger.ker...@dilger.ca; j...@suse.cz; 
linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: remove the null check of bio_vec page

On Tue 20-10-20 16:22:01, Xianting Tian wrote:
> bv_page can't be NULL in a valid bio_vec, so we can remove the NULL 
> check, as we did in other places when calling 
> bio_for_each_segment_all() to go through all bio_vec of a bio.
> 
> Signed-off-by: Xianting Tian 

Thanks for the patch. It looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/ext4/page-io.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 
> defd2e10d..cb135a944 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -111,9 +111,6 @@ static void ext4_finish_bio(struct bio *bio)
>   unsigned under_io = 0;
>   unsigned long flags;
>  
> - if (!page)
> - continue;
> -
>   if (fscrypt_is_bounce_page(page)) {
>   bounce_page = page;
>   page = fscrypt_pagecache_page(bounce_page);
> --
> 2.17.1
> 
--
Jan Kara 
SUSE Labs, CR


RE: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

2020-10-22 Thread Tianxianting
Yes, thanks
I see, If we add this patch, we need to get all cpu arch that support nested 
interrupts.

-Original Message-
From: Finn Thain [mailto:fth...@telegraphics.com.au] 
Sent: Thursday, October 22, 2020 11:29 AM
To: tianxianting (RD) 
Cc: kashyap.de...@broadcom.com; sumit.sax...@broadcom.com; 
shivasharan.srikanteshw...@broadcom.com; j...@linux.ibm.com; 
martin.peter...@oracle.com; megaraidlinux@broadcom.com; 
linux-s...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

On Thu, 22 Oct 2020, Tianxianting wrote:

> Do you mean Megasas raid can be used in m68k arch?

m68k is one example of an architecture on which the unstated assumptions in 
your patch would be invalid. Does this help to clarify what I wrote?

If Megasas raid did work on m68k, I'm sure it could potentially benefit from 
the theoretical performance improvement from your patch.

So perhaps you would consider adding support for slower CPUs like m68k.


RE: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

2020-10-21 Thread Tianxianting
Thanks,
Do you mean Megasas raid can be used in m68k arch?

-Original Message-
From: Finn Thain [mailto:fth...@telegraphics.com.au] 
Sent: Thursday, October 22, 2020 10:25 AM
To: tianxianting (RD) 
Cc: kashyap.de...@broadcom.com; sumit.sax...@broadcom.com; 
shivasharan.srikanteshw...@broadcom.com; j...@linux.ibm.com; 
martin.peter...@oracle.com; megaraidlinux@broadcom.com; 
linux-s...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

On Wed, 21 Oct 2020, Xianting Tian wrote:

> Since we already in hard IRQ context when running megasas_isr(),

On m68k, hard irq context does not mean interrupts are disabled. Are there no 
other architectures in that category?

> so use spin_lock() is enough, which is faster than spin_lock_irqsave().
> 

Is that measurable?

> Signed-off-by: Xianting Tian 
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 2b7e7b5f3..bd186254d 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -3977,15 +3977,14 @@ static irqreturn_t megasas_isr(int irq, void 
> *devp)  {
>   struct megasas_irq_context *irq_context = devp;
>   struct megasas_instance *instance = irq_context->instance;
> - unsigned long flags;
>   irqreturn_t rc;
>  
>   if (atomic_read(>fw_reset_no_pci_access))
>   return IRQ_HANDLED;
>  
> - spin_lock_irqsave(>hba_lock, flags);
> + spin_lock(>hba_lock);
>   rc = megasas_deplete_reply_queue(instance, DID_OK);
> - spin_unlock_irqrestore(>hba_lock, flags);
> + spin_unlock(>hba_lock);
>  
>   return rc;
>  }
> 


RE: [PATCH] blk-mq: remove the calling of local_memory_node()

2020-10-19 Thread Tianxianting
Thanks Michal,
Hi, raghavendra, Jens
Could you help comment this issue? Thanks in advance.

-Original Message-
From: Michal Hocko [mailto:mho...@suse.com] 
Sent: Monday, October 19, 2020 7:40 PM
To: tianxianting (RD) 
Cc: ax...@kernel.dk; raghavendra...@linux.vnet.ibm.com; 
linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] blk-mq: remove the calling of local_memory_node()

On Mon 19-10-20 16:20:47, Xianting Tian wrote:
> We don't need to check whether the node is memoryless numa node before 
> calling allocator interface. SLUB(and SLAB,SLOB) relies on the page 
> allocator to pick a node. Page allocator should deal with memoryless 
> nodes just fine. It has zonelists constructed for each possible nodes.
> And it will automatically fall back into a node which is closest to 
> the requested node. As long as __GFP_THISNODE is not enforced of course.
> 
> The code comments of kmem_cache_alloc_node() of SLAB also showed this:
>  * Fallback to other node is possible if __GFP_THISNODE is not set.
> 
> blk-mq code doesn't set __GFP_THISNODE, so we can remove the calling 
> of local_memory_node().

yes, this is indeed the case. I cannot really judge the blg-mq code but it 
seems to be unnecessary. Maybe there are some subtle details not explained by 
bffed457160ab though.

> Fixes: bffed457160ab ("blk-mq: Avoid memoryless numa node encoded in 
> hctx numa_node")

But the existing code is not broken. It just overdoes what needs to be done. So 
effectively bffed457160ab was not needed. I do not think that Fixes is really 
necessary.

> Signed-off-by: Xianting Tian 
> ---
>  block/blk-mq-cpumap.c | 2 +-
>  block/blk-mq.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 
> 0157f2b34..3db84d319 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -89,7 +89,7 @@ int blk_mq_hw_queue_to_node(struct blk_mq_queue_map 
> *qmap, unsigned int index)
>  
>   for_each_possible_cpu(i) {
>   if (index == qmap->mq_map[i])
> - return local_memory_node(cpu_to_node(i));
> + return cpu_to_node(i);
>   }
>  
>   return NUMA_NO_NODE;
> diff --git a/block/blk-mq.c b/block/blk-mq.c index 
> cdced4aca..48f8366b2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2737,7 +2737,7 @@ static void blk_mq_init_cpu_queues(struct request_queue 
> *q,
>   for (j = 0; j < set->nr_maps; j++) {
>   hctx = blk_mq_map_queue_type(q, j, i);
>   if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
> - hctx->numa_node = 
> local_memory_node(cpu_to_node(i));
> + hctx->numa_node = cpu_to_node(i);
>   }
>   }
>  }
> --
> 2.17.1

--
Michal Hocko
SUSE Labs


RE: [PATCH] mm: Make allocator take care of memoryless numa node

2020-10-19 Thread Tianxianting
Thanks Michal,
Yes, it is the commit bffed457160ab. Sorry I forgot to paste it in my previous 
reply.


-Original Message-
From: Michal Hocko [mailto:mho...@suse.com] 
Sent: Monday, October 19, 2020 3:07 PM
To: tianxianting (RD) 
Cc: c...@linux.com; penb...@kernel.org; rient...@google.com; 
iamjoonsoo@lge.com; a...@linux-foundation.org; linux...@kvack.org; 
linux-kernel@vger.kernel.org; k...@kernel.org; alexei.starovoi...@gmail.com
Subject: Re: [PATCH] mm: Make allocator take care of memoryless numa node

On Sun 18-10-20 14:18:37, Tianxianting wrote:
> Thanks for the comments
> I found in current code, there are two places to call
> local_memory_node(node) before calling kzalloc_node(), I think we can 
> remove them?

I am not sure which code you are talking about. git grep shows me 2 places in 
blk-mq code (e.g. bffed457160ab) and that looks quite bogus to me. Bring that 
up with the respective maintainer and Raghavendra.
The changelog doesn't really describe any problem, if there is any. But from 
the allocator semantic point of view memory less nodes are to be expected and 
the allocator should fallback to the proper node. As long as __GFP_THISNODE is 
not enforced of course.
--
Michal Hocko
SUSE Labs


RE: [PATCH] mm: Make allocator take care of memoryless numa node

2020-10-18 Thread Tianxianting
Thanks for the comments
I found in current code, there are two places to call local_memory_node(node) 
before calling kzalloc_node(), I think we can remove them?

-Original Message-
From: Michal Hocko [mailto:mho...@suse.com] 
Sent: Monday, October 12, 2020 11:06 PM
To: tianxianting (RD) 
Cc: c...@linux.com; penb...@kernel.org; rient...@google.com; 
iamjoonsoo@lge.com; a...@linux-foundation.org; linux...@kvack.org; 
linux-kernel@vger.kernel.org; k...@kernel.org; alexei.starovoi...@gmail.com
Subject: Re: [PATCH] mm: Make allocator take care of memoryless numa node

On Mon 12-10-20 16:27:39, Xianting Tian wrote:
> In architecture like powerpc, we can have cpus without any local 
> memory attached to it. In such cases the node does not have real memory.

Yes, this is normal (unfortunately).

> In many places of current kernel code, it doesn't judge whether the 
> node is memoryless numa node before calling allocator interface.

And that is correct. It shouldn't make any assumption on the memory on a given 
node because that memory might be depleted (similar to no memory) or it can 
disappear at any moment because of the memory offlining.

> This patch is to use local_memory_node(), which is guaranteed to have 
> memory, in allocator interface. local_memory_node() is a noop in other 
> architectures that don't support memoryless nodes.
> 
> As the call path:
>   alloc_pages_node
>   __alloc_pages_node
>   __alloc_pages_nodemask
> and __alloc_pages_node,__alloc_pages_nodemask may be called directly, 
> so only add local_memory_node() in __alloc_pages_nodemask.

Page allocator should deal with memory less nodes just fine. It has zonelists 
constructed for each possible nodes. And it will automatically fall back into a 
node with is closest to the requested node.
local_memory_node might be incorrect choice from the topology POV.

What kind of problem are you trying to fix?

> Signed-off-by: Xianting Tian 
> ---
>  include/linux/slab.h |  3 +++
>  mm/page_alloc.c  |  1 +
>  mm/slab.c|  6 +-
>  mm/slob.c|  1 +
>  mm/slub.c| 10 --
>  5 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h index 
> 24df2393e..527e811e0 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -574,6 +574,7 @@ static __always_inline void *kmalloc_node(size_t size, 
> gfp_t flags, int node)
>   flags, node, size);
>   }
>  #endif
> + node = local_memory_node(node);
>   return __kmalloc_node(size, flags, node);  }
>  
> @@ -626,6 +627,8 @@ static inline void *kmalloc_array_node(size_t n, size_t 
> size, gfp_t flags,
>   return NULL;
>   if (__builtin_constant_p(n) && __builtin_constant_p(size))
>   return kmalloc_node(bytes, flags, node);
> +
> + node = local_memory_node(node);
>   return __kmalloc_node(bytes, flags, node);  }
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 
> 6866533de..be63c62c2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4878,6 +4878,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order, int preferred_nid,
>   return NULL;
>   }
>  
> + preferred_nid = local_memory_node(preferred_nid);
>   gfp_mask &= gfp_allowed_mask;
>   alloc_mask = gfp_mask;
>   if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, 
> , _mask, _flags)) diff --git a/mm/slab.c b/mm/slab.c 
> index f658e86ec..263c2f2e1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3575,7 +3575,10 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
>   */
>  void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, 
> int nodeid)  {
> - void *ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
> + void *ret;
> +
> + nodeid = local_memory_node(nodeid);
> + ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
>  
>   trace_kmem_cache_alloc_node(_RET_IP_, ret,
>   cachep->object_size, cachep->size, @@ 
> -3593,6 +3596,7 @@ void 
> *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,  {
>   void *ret;
>  
> + nodeid = local_memory_node(nodeid);
>   ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
>  
>   ret = kasan_kmalloc(cachep, ret, size, flags); diff --git 
> a/mm/slob.c b/mm/slob.c index 7cc9805c8..1f1c25e06 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -636,6 +636,7 @@ EXPORT_SYMBOL(__kmalloc_node);
>  
>  void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t gfp, int 
> node)  {
> + node = local_memory_node(node);
>   return slob_alloc_node(cachep, g

RE: [PATCH] mm: vmscan: avoid a unnecessary reschedule in shrink_slab()

2020-10-16 Thread Tianxianting
Thanks for the explain.
I got it.

-Original Message-
From: Michal Hocko [mailto:mho...@suse.com] 
Sent: Friday, October 16, 2020 9:45 PM
To: tianxianting (RD) 
Cc: a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: vmscan: avoid a unnecessary reschedule in shrink_slab()

On Fri 16-10-20 13:20:41, Tianxianting wrote:
> Thanks
> I understood what you said :)
> But whether it is proper to check reschedule in every loop when lock is 
> taken? 

I do not see any actual problem TBH. cond_resched is mostly to increase 
interactivity for non preemptible kernel. It can reduce throughput but this is 
a memory reclaim path and I do not expect this to contribute to any moderate 
hot paths. Direct reclaim doesn't really count as a hot path.

--
Michal Hocko
SUSE Labs


RE: [PATCH] mm: vmscan: avoid a unnecessary reschedule in shrink_slab()

2020-10-16 Thread Tianxianting
Thanks
I understood what you said :)
But whether it is proper to check reschedule in every loop when lock is taken? 

By the way, I did not met a issue for this , I just learn this code and come up 
with one possible optimization based my understanding.

-Original Message-
From: Michal Hocko [mailto:mho...@suse.com] 
Sent: Friday, October 16, 2020 9:02 PM
To: tianxianting (RD) 
Cc: a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: vmscan: avoid a unnecessary reschedule in shrink_slab()

On Fri 16-10-20 12:48:23, Tianxianting wrote:
> Thanks, my understanding is,
> In shrink_slab(), do_shrink_slab() will do the real reclaim work, which will 
> occupy current cpu and consume more cpu time, so we need to trigger a 
> reschedule after reclaim.
> But if it jumps to 'out' label, that means we don't do the reclaim work at 
> this time, it won't cause other thread getting starvation, so we don't need 
> to call cond_resched() in this case.
> Is it right?

You are almost right. But consider situation when the lock is taken for quite 
some time. do_shrink_slab cannot make any forward progress and effectivelly 
busy loop. Unless the caller does cond_resched it might cause soft lockups.

Anyway let me try to ask again. Why does would this be any problem that 
deserves a fix?

> 
> -Original Message-
> From: Michal Hocko [mailto:mho...@suse.com]
> Sent: Friday, October 16, 2020 8:08 PM
> To: tianxianting (RD) 
> Cc: a...@linux-foundation.org; linux...@kvack.org; 
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] mm: vmscan: avoid a unnecessary reschedule in 
> shrink_slab()
> 
> On Fri 16-10-20 11:39:52, Xianting Tian wrote:
> > In shrink_slab(), it directly goes to 'out' label only when it can't 
> > get the lock of shrinker_rwsew. In this case, it doesn't do the real 
> > work of shrinking slab, so we don't need trigger a reschedule by 
> > cond_resched().
> 
> Your changelog doesn't explain why this is not needed or undesirable. Do you 
> see any actual problem?
> 
> The point of this code is to provide a deterministic scheduling point 
> regardless of the shrinker_rwsew.
> 
> > 
> > Signed-off-by: Xianting Tian 
> > ---
> >  mm/vmscan.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c index 466fc3144..676e97b28
> > 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -687,8 +687,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int 
> > nid,
> > }
> >  
> > up_read(_rwsem);
> > -out:
> > +
> > cond_resched();
> > +out:
> > return freed;
> >  }
> >  
> > --
> > 2.17.1
> > 
> 
> --
> Michal Hocko
> SUSE Labs

--
Michal Hocko
SUSE Labs


RE: [PATCH] mm: vmscan: avoid a unnecessary reschedule in shrink_slab()

2020-10-16 Thread Tianxianting
Thanks, my understanding is,
In shrink_slab(), do_shrink_slab() will do the real reclaim work, which will 
occupy current cpu and consume more cpu time, so we need to trigger a 
reschedule after reclaim.
But if it jumps to 'out' label, that means we don't do the reclaim work at this 
time, it won't cause other thread getting starvation, so we don't need to call 
cond_resched() in this case.
Is it right?

-Original Message-
From: Michal Hocko [mailto:mho...@suse.com] 
Sent: Friday, October 16, 2020 8:08 PM
To: tianxianting (RD) 
Cc: a...@linux-foundation.org; linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: vmscan: avoid a unnecessary reschedule in shrink_slab()

On Fri 16-10-20 11:39:52, Xianting Tian wrote:
> In shrink_slab(), it directly goes to 'out' label only when it can't 
> get the lock of shrinker_rwsew. In this case, it doesn't do the real 
> work of shrinking slab, so we don't need trigger a reschedule by 
> cond_resched().

Your changelog doesn't explain why this is not needed or undesirable. Do you 
see any actual problem?

The point of this code is to provide a deterministic scheduling point 
regardless of the shrinker_rwsew.

> 
> Signed-off-by: Xianting Tian 
> ---
>  mm/vmscan.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c index 466fc3144..676e97b28 
> 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -687,8 +687,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>   }
>  
>   up_read(_rwsem);
> -out:
> +
>   cond_resched();
> +out:
>   return freed;
>  }
>  
> --
> 2.17.1
> 

-- 
Michal Hocko
SUSE Labs


RE: [PATCH] IB/hfi1: Avoid allocing memory on memoryless numa node

2020-10-12 Thread Tianxianting
Hi Dennis
Thanks for the comments
If it depends on x86_64, I think this issue doesn't exist.
Sorry to disturb you.

-Original Message-
From: Dennis Dalessandro [mailto:dennis.dalessan...@cornelisnetworks.com] 
Sent: Monday, October 12, 2020 8:37 PM
To: tianxianting (RD) ; mike.marcinis...@intel.com; 
dennis.dalessan...@intel.com; dledf...@redhat.com; j...@ziepe.ca
Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] IB/hfi1: Avoid allocing memory on memoryless numa node

On 10/10/2020 4:57 AM, Xianting Tian wrote:
> In architecture like powerpc, we can have cpus without any local 
> memory attached to it. In such cases the node does not have real memory.
> 
> Use local_memory_node(), which is guaranteed to have memory.
> local_memory_node is a noop in other architectures that does not 
> support memoryless nodes.
> 
> Signed-off-by: Xianting Tian 
> ---
>   drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/file_ops.c 
> b/drivers/infiniband/hw/hfi1/file_ops.c
> index 8ca51e43c..79fa22cc7 100644
> --- a/drivers/infiniband/hw/hfi1/file_ops.c
> +++ b/drivers/infiniband/hw/hfi1/file_ops.c
> @@ -965,7 +965,7 @@ static int allocate_ctxt(struct hfi1_filedata *fd, struct 
> hfi1_devdata *dd,
>*/
>   fd->rec_cpu_num = hfi1_get_proc_affinity(dd->node);
>   if (fd->rec_cpu_num != -1)
> - numa = cpu_to_node(fd->rec_cpu_num);
> + numa = local_memory_node(cpu_to_node(fd->rec_cpu_num));
>   else
>   numa = numa_node_id();
>   ret = hfi1_create_ctxtdata(dd->pport, numa, );
> 

The hfi1 driver depends on X86_64. I'm not sure what this patch buys, can you 
expand a bit?

-Denny


RE: [PATCH] bpf: Avoid allocing memory on memoryless numa node

2020-10-11 Thread Tianxianting
Thanks Alexei for your suggestion,
I will try to do it.

-Original Message-
From: Alexei Starovoitov [mailto:alexei.starovoi...@gmail.com] 
Sent: Monday, October 12, 2020 9:21 AM
To: tianxianting (RD) 
Cc: Alexei Starovoitov ; Daniel Borkmann 
; David S. Miller ; Jakub Kicinski 
; Jesper Dangaard Brouer ; John Fastabend 
; Martin KaFai Lau ; Song Liu 
; Yonghong Song ; Andrii Nakryiko 
; KP Singh ; Network Development 
; bpf ; LKML 

Subject: Re: [PATCH] bpf: Avoid allocing memory on memoryless numa node

On Sat, Oct 10, 2020 at 1:55 AM Xianting Tian  wrote:
>
> In architecture like powerpc, we can have cpus without any local 
> memory attached to it. In such cases the node does not have real memory.
>
> Use local_memory_node(), which is guaranteed to have memory.
> local_memory_node is a noop in other architectures that does not 
> support memoryless nodes.
...
> /* Have map->numa_node, but choose node of redirect target CPU */
> -   numa = cpu_to_node(cpu);
> +   numa = local_memory_node(cpu_to_node(cpu));

There are so many calls to cpu_to_node() throughout the kernel.
Are you going to convert all of them one patch at a time to the above sequence?
Why not do this CONFIG_HAVE_MEMORYLESS_NODES in cpu_to_node() instead?
and save the churn.


RE: [PATCH] net: Avoid allocing memory on memoryless numa node

2020-10-11 Thread Tianxianting
Hi Jakub,
Thanks for your suggestion,
Let me try it :-)

-Original Message-
From: Jakub Kicinski [mailto:k...@kernel.org] 
Sent: Monday, October 12, 2020 3:18 AM
To: tianxianting (RD) 
Cc: da...@davemloft.net; net...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] net: Avoid allocing memory on memoryless numa node

On Sun, 11 Oct 2020 12:11:40 +0800 Xianting Tian wrote:
> In architecture like powerpc, we can have cpus without any local 
> memory attached to it. In such cases the node does not have real memory.
> 
> Use local_memory_node(), which is guaranteed to have memory.
> local_memory_node is a noop in other architectures that does not 
> support memoryless nodes.
> 
> Signed-off-by: Xianting Tian 
> ---
>  net/core/dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c index 
> 266073e30..dcb4533ef 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2590,7 +2590,7 @@ static struct xps_map *expand_xps_map(struct xps_map 
> *map, int attr_index,
>   new_map = kzalloc(XPS_MAP_SIZE(alloc_len), GFP_KERNEL);
>   else
>   new_map = kzalloc_node(XPS_MAP_SIZE(alloc_len), GFP_KERNEL,
> -cpu_to_node(attr_index));
> +
> local_memory_node(cpu_to_node(attr_index)));
>   if (!new_map)
>   return NULL;
>  

Are we going to patch all kmalloc_node() callers now to apply 
local_memory_node()?  Can't the allocator take care of this?



RE: [PATCH] fs: use correct parameter in notes of generic_file_llseek_size()

2020-10-08 Thread Tianxianting
Sorry Randy,
I used my cellphone to send the previous mail, it is html.

-Original Message-
From: Randy Dunlap [mailto:rdun...@infradead.org] 
Sent: Thursday, October 08, 2020 12:50 AM
To: tianxianting (RD) ; v...@zeniv.linux.org.uk
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fs: use correct parameter in notes of 
generic_file_llseek_size()

On 10/7/20 8:20 AM, Tianxianting wrote:
> hi,
> 
> thanks Randy
> 
> I checked the latest code, seems this patch not applied currently.

Hi--

Please don't send html email.
I'm pretty sure that the mailing list has dropped (discarded) your email 
because it was html.

Probably only Al and I received your email.

Al- Would you prefer that fs/ documentation patches go to someone else for 
merging?  maybe Andrew?

Thanks.

PS: I can't tell if I am writing an html email or not... :(


> 
> 发件人: Randy Dunlap 
> 发送时间: Friday, September 11, 2020 10:57:24 AM
> 收件人: tianxianting (RD); v...@zeniv.linux.org.uk
> 抄送: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org
> 主题: Re: [PATCH] fs: use correct parameter in notes of 
> generic_file_llseek_size()
> 
> On 9/10/20 7:06 PM, Tianxianting wrote:
>> Hi viro,
>> Could I get your feedback?
>> This patch fixed the build warning, I think it can be applied, thanks 
>> :)
>>
>> -Original Message-
>> From: tianxianting (RD)
>> Sent: Saturday, September 05, 2020 3:15 PM
>> To: v...@zeniv.linux.org.uk
>> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>> tianxianting (RD) 
>> Subject: [PATCH] fs: use correct parameter in notes of 
>> generic_file_llseek_size()
>>
>> Fix warning when compiling with W=1:
>> fs/read_write.c:88: warning: Function parameter or member 'maxsize' not 
>> described in 'generic_file_llseek_size'
>> fs/read_write.c:88: warning: Excess function parameter 'size' description in 
>> 'generic_file_llseek_size'
>>
>> Signed-off-by: Xianting Tian 
> 
> Acked-by: Randy Dunlap 
> 
> Thanks.
> 
>> ---
>>  fs/read_write.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/read_write.c b/fs/read_write.c index 
>> 5db58b8c7..058563ee2 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -71,7 +71,7 @@ EXPORT_SYMBOL(vfs_setpos);
>>   * @file:file structure to seek on
>>   * @offset:  file offset to seek to
>>   * @whence:  type of seek
>> - * @size:max size of this file in file system
>> + * @maxsize: max size of this file in file system
>>   * @eof: offset used for SEEK_END position
>>   *
>>   * This is a variant of generic_file_llseek that allows passing in a 
>> custom
>>
> 
> 
> --
> ~Randy
> 

--
~Randy



RE: [PATCH] [v3] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

2020-09-28 Thread Tianxianting
Thank you Bart Van
Hi Jens,
Whether this version patch is ok for you?
Looking forward to your further comments :)

-Original Message-
From: Bart Van Assche [mailto:bvanass...@acm.org] 
Sent: Saturday, September 26, 2020 12:07 PM
To: tianxianting (RD) ; ax...@kernel.dk
Cc: linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v3] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

On 2020-09-25 19:39, Xianting Tian wrote:
> diff --git a/block/blk-mq.c b/block/blk-mq.c index 
> b3d2785ee..62d152d03 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3256,9 +3256,11 @@ static int __blk_mq_alloc_rq_maps(struct 
> blk_mq_tag_set *set)  {
>   int i;
>  
> - for (i = 0; i < set->nr_hw_queues; i++)
> + for (i = 0; i < set->nr_hw_queues; i++) {
>   if (!__blk_mq_alloc_map_and_request(set, i))
>   goto out_unwind;
> + cond_resched();
> + }
>  
>   return 0;

Reviewed-by: Bart Van Assche 


RE: [PATCH] [v2] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

2020-09-25 Thread Tianxianting
Hi Jens
Thanks a lot for the comments,
I think it is not hot path, it is only called when system startup or device 
hot-plugging.
So I submitted V3 patch for you reviewing :)
https://lkml.org/lkml/2020/9/25/1543 

-Original Message-
From: Jens Axboe [mailto:ax...@kernel.dk] 
Sent: Saturday, September 26, 2020 3:26 AM
To: tianxianting (RD) 
Cc: linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

On 9/17/20 2:13 AM, Xianting Tian wrote:
> We found it takes more time of blk_mq_alloc_rq_maps() in kernel space 
> when testing nvme hot-plugging. The test and anlysis as below.
> 
> Debug code,
> 1, blk_mq_alloc_rq_maps():
> u64 start, end;
> depth = set->queue_depth;
> start = ktime_get_ns();
> pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n",
> current->pid, current->comm, current->nvcsw, 
> current->nivcsw,
> set->queue_depth, set->nr_hw_queues);
> do {
> err = __blk_mq_alloc_rq_maps(set);
> if (!err)
> break;
> 
> set->queue_depth >>= 1;
> if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) {
> err = -ENOMEM;
> break;
> }
> } while (set->queue_depth);
> end = ktime_get_ns();
> pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld 
> ns\n",
> current->pid, current->comm,
> current->nvcsw, current->nivcsw, end - start);
> 
> 2, __blk_mq_alloc_rq_maps():
> u64 start, end;
> for (i = 0; i < set->nr_hw_queues; i++) {
> start = ktime_get_ns();
> if (!__blk_mq_alloc_rq_map(set, i))
> goto out_unwind;
> end = ktime_get_ns();
> pr_err("hw queue %d init cost time %lld\n", i, end - start);
> }
> 
> Test nvme hot-plugging with above debug code, we found it totally cost 
> more than 3ms in kernel space without being scheduled out when alloc 
> rqs for all
> 16 hw queues with depth 1024, each hw queue cost about 140-250us. The 
> time cost will be increased with hw queue number and queue depth 
> increasing. And if __blk_mq_alloc_rq_maps() returns -ENOMEM, it will 
> try "queue_depth >>= 1", more time will be consumed.
>   [  428.428771] nvme nvme0: pci function 1:01:00.0
>   [  428.428798] nvme 1:01:00.0: enabling device ( -> 0002)
>   [  428.428806] pcieport 1:00:00.0: can't derive routing for PCI INT 
> A
>   [  428.428809] nvme 1:01:00.0: PCI INT A: no GSI
>   [  432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, 
> nr_hw_queues 1
>   [  432.593404] hw queue 0 init cost time 22883 ns
>   [  432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init 
> cost time 35960 ns
>   [  432.595953] nvme nvme0: 16/0/0 default/read/poll queues
>   [  432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, 
> nr_hw_queues 16
>   [  432.596203] hw queue 0 init cost time 242630 ns
>   [  432.596441] hw queue 1 init cost time 235913 ns
>   [  432.596659] hw queue 2 init cost time 216461 ns
>   [  432.596877] hw queue 3 init cost time 215851 ns
>   [  432.597107] hw queue 4 init cost time 228406 ns
>   [  432.597336] hw queue 5 init cost time 227298 ns
>   [  432.597564] hw queue 6 init cost time 224633 ns
>   [  432.597785] hw queue 7 init cost time 219954 ns
>   [  432.597937] hw queue 8 init cost time 150930 ns
>   [  432.598082] hw queue 9 init cost time 143496 ns
>   [  432.598231] hw queue 10 init cost time 147261 ns
>   [  432.598397] hw queue 11 init cost time 164522 ns
>   [  432.598542] hw queue 12 init cost time 143401 ns
>   [  432.598692] hw queue 13 init cost time 148934 ns
>   [  432.598841] hw queue 14 init cost time 147194 ns
>   [  432.598991] hw queue 15 init cost time 148942 ns
>   [  432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init 
> cost time 3035099 ns
>   [  432.602611]  nvme0n1: p1
> 
> So use this patch to trigger schedule between each hw queue init, to 
> avoid other threads getting stuck. We call cond_resched() only when 
> "queue depth >= 512". We are not in atomic context when executing 
> __blk_mq_alloc_rq_maps(), so it is safe to call cond_resched().
> 
> Signed-off-by: Xianting Tian 
> ---
>  block/blk-mq.c | 

RE: [PATCH] [v2] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

2020-09-25 Thread Tianxianting
Hi Jens
Could I get your comments for the issue? Whether it is a problem?
Maybe my understanding is superficial, highly appreciated if you can comment a 
bit.

-Original Message-
From: tianxianting (RD) 
Sent: Thursday, September 17, 2020 4:13 PM
To: ax...@kernel.dk
Cc: linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org; tianxianting 
(RD) 
Subject: [PATCH] [v2] blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps()

We found it takes more time of blk_mq_alloc_rq_maps() in kernel space when 
testing nvme hot-plugging. The test and anlysis as below.

Debug code,
1, blk_mq_alloc_rq_maps():
u64 start, end;
depth = set->queue_depth;
start = ktime_get_ns();
pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n",
current->pid, current->comm, current->nvcsw, 
current->nivcsw,
set->queue_depth, set->nr_hw_queues);
do {
err = __blk_mq_alloc_rq_maps(set);
if (!err)
break;

set->queue_depth >>= 1;
if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) {
err = -ENOMEM;
break;
}
} while (set->queue_depth);
end = ktime_get_ns();
pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n",
current->pid, current->comm,
current->nvcsw, current->nivcsw, end - start);

2, __blk_mq_alloc_rq_maps():
u64 start, end;
for (i = 0; i < set->nr_hw_queues; i++) {
start = ktime_get_ns();
if (!__blk_mq_alloc_rq_map(set, i))
goto out_unwind;
end = ktime_get_ns();
pr_err("hw queue %d init cost time %lld\n", i, end - start);
}

Test nvme hot-plugging with above debug code, we found it totally cost more 
than 3ms in kernel space without being scheduled out when alloc rqs for all
16 hw queues with depth 1024, each hw queue cost about 140-250us. The time cost 
will be increased with hw queue number and queue depth increasing. And if 
__blk_mq_alloc_rq_maps() returns -ENOMEM, it will try "queue_depth >>= 1", more 
time will be consumed.
[  428.428771] nvme nvme0: pci function 1:01:00.0
[  428.428798] nvme 1:01:00.0: enabling device ( -> 0002)
[  428.428806] pcieport 1:00:00.0: can't derive routing for PCI INT 
A
[  428.428809] nvme 1:01:00.0: PCI INT A: no GSI
[  432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, 
nr_hw_queues 1
[  432.593404] hw queue 0 init cost time 22883 ns
[  432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init 
cost time 35960 ns
[  432.595953] nvme nvme0: 16/0/0 default/read/poll queues
[  432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, 
nr_hw_queues 16
[  432.596203] hw queue 0 init cost time 242630 ns
[  432.596441] hw queue 1 init cost time 235913 ns
[  432.596659] hw queue 2 init cost time 216461 ns
[  432.596877] hw queue 3 init cost time 215851 ns
[  432.597107] hw queue 4 init cost time 228406 ns
[  432.597336] hw queue 5 init cost time 227298 ns
[  432.597564] hw queue 6 init cost time 224633 ns
[  432.597785] hw queue 7 init cost time 219954 ns
[  432.597937] hw queue 8 init cost time 150930 ns
[  432.598082] hw queue 9 init cost time 143496 ns
[  432.598231] hw queue 10 init cost time 147261 ns
[  432.598397] hw queue 11 init cost time 164522 ns
[  432.598542] hw queue 12 init cost time 143401 ns
[  432.598692] hw queue 13 init cost time 148934 ns
[  432.598841] hw queue 14 init cost time 147194 ns
[  432.598991] hw queue 15 init cost time 148942 ns
[  432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init 
cost time 3035099 ns
[  432.602611]  nvme0n1: p1

So use this patch to trigger schedule between each hw queue init, to avoid 
other threads getting stuck. We call cond_resched() only when "queue depth >= 
512". We are not in atomic context when executing __blk_mq_alloc_rq_maps(), so 
it is safe to call cond_resched().

Signed-off-by: Xianting Tian 
---
 block/blk-mq.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c index b3d2785ee..5a71fe53a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3255,11 +3255,16 @@ void blk_mq_exit_queue(struct request_queue *q)  static 
int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)  {
int i;
+   unsigned int depth = set->queue_depth;
 
-   for (i = 0; i < set->nr_hw_queues; i++)
+   for (i = 0; i < set->nr_hw_queues;

RE: [PATCH] [v2] nvme: replace meaningless judgement by checking whether req is null

2020-09-22 Thread Tianxianting
Thank you Keith, Christoph,
So I don't need to send v3 patch? 

-Original Message-
From: Christoph Hellwig [mailto:h...@lst.de] 
Sent: Tuesday, September 22, 2020 10:59 PM
To: Keith Busch 
Cc: tianxianting (RD) ; ax...@fb.com; h...@lst.de; 
s...@grimberg.me; linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] nvme: replace meaningless judgement by checking 
whether req is null

On Tue, Sep 22, 2020 at 07:57:05AM -0700, Keith Busch wrote:
> The commit subject is a too long. We should really try to keep these 
> to
> 50 characters or less.
> 
>   nvme-pci: fix NULL req in completion handler
> 
> Otherwise, looks fine.
> 
> Reviewed-by: Keith Busch 

Yes.  I was about to apply it with a similar edit, but I'll take yours happily.


RE: [PATCH] [v2] nvme: replace meaningless judgement by checking whether req is null

2020-09-22 Thread Tianxianting
Finally, it applied:)
Thanks again for all your kindly guides to me.

-Original Message-
From: Christoph Hellwig [mailto:h...@lst.de] 
Sent: Tuesday, September 22, 2020 11:41 PM
To: tianxianting (RD) 
Cc: Christoph Hellwig ; Keith Busch ; 
ax...@fb.com; s...@grimberg.me; linux-n...@lists.infradead.org; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] nvme: replace meaningless judgement by checking 
whether req is null

On Tue, Sep 22, 2020 at 03:27:27PM +, Tianxianting wrote:
> Thank you Keith, Christoph,
> So I don't need to send v3 patch? 

No, it is all fine.  I've already applied it locally.


RE: [PATCH] nvme: replace meaningless judgement by checking whether req is null

2020-09-21 Thread Tianxianting
I get it now, thanks Keith:)

-Original Message-
From: Keith Busch [mailto:kbu...@kernel.org] 
Sent: Monday, September 21, 2020 11:59 PM
To: tianxianting (RD) 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether 
req is null

On Mon, Sep 21, 2020 at 03:49:09PM +, Tianxianting wrote:
> HI Keith,
> Thanks for your comments,
> I will submit a new patch of version 2 for the further reviewing,  v2 patch 
> will contains:
> 1, retain existing judgement and dev_warn;

No no, go ahead and remove the existing check just as you've done. That check 
becomes redundant with the safer one you're adding, and we don't want redundant 
checks in the fast path. My only suggestion is to use the same dev_warn() in 
your new check.

> 2, add the check whether req is null(already did in this patch) 3, 
> simplify and make the changelog succinct according to you said " This is what 
> I'm thinking:".
> Is it right?
> Should I retain the nvme_irq crash log in changelog, mention the difference 
> between nvmeq->q_depth and tagset queue_depth? 

The tagset's queue_depth is a valid point to mention as well. The dirver's 
current indirect check is not necessarily in sync with the actual tagset.
 
> Thanks
> 
> -Original Message-
> From: Keith Busch [mailto:kbu...@kernel.org]
> Sent: Monday, September 21, 2020 11:08 PM
> To: tianxianting (RD) 
> Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
> linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] nvme: replace meaningless judgement by checking 
> whether req is null
> 
> On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> > @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> > *nvmeq, u16 idx)
> > struct nvme_completion *cqe = >cqes[idx];
> > struct request *req;
> >  
> > -   if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> > -   dev_warn(nvmeq->dev->ctrl.device,
> > -   "invalid id %d completed on queue %d\n",
> > -   cqe->command_id, le16_to_cpu(cqe->sq_id));
> > -   return;
> > -   }
> > -
> > /*
> >  * AEN requests are special as they don't time out and can
> >  * survive any kind of queue freeze and often don't respond to @@
> > -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> > *nvmeq, u16 idx)
> > }
> >  
> > req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> > +   if (unlikely(!req)) {
> > +   dev_warn(nvmeq->dev->ctrl.device,
> > +   "req is null for tag %d completed on queue %d\n",
> > +   cqe->command_id, le16_to_cpu(cqe->sq_id));
> > +   return;
> > +   }
> 
> This is making sense now, though I think we should retain the existing
> dev_warn() since it's still accurate and provides continuity for people who 
> are used to looking for these sorts of messages.
> 
> Your changelog is a bit much though. I think we can say it a bit more 
> succinctly. This is what I'm thinking:
> 
>   The driver registers interrupts for queues before initializing the
>   tagset because it uses the number of successful request_irq() calls
>   to configure the tagset parameters. This allows a race condition with
>   the current tag validity check if the controller happens to produce
>   an interrupt with a corrupted CQE before the tagset is initialized.
> 
>   Replace the driver's indirect tag check with the one already provided
>   by the block layer.


RE: [PATCH] nvme: replace meaningless judgement by checking whether req is null

2020-09-21 Thread Tianxianting
HI Keith,
Thanks for your comments,
I will submit a new patch of version 2 for the further reviewing,  v2 patch 
will contains:
1, retain existing judgement and dev_warn;
2, add the check whether req is null(already did in this patch)
3, simplify and make the changelog succinct according to you said " This is 
what I'm thinking:".
Is it right?
Should I retain the nvme_irq crash log in changelog, mention the difference 
between nvmeq->q_depth and tagset queue_depth? 

Thanks

-Original Message-
From: Keith Busch [mailto:kbu...@kernel.org] 
Sent: Monday, September 21, 2020 11:08 PM
To: tianxianting (RD) 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether 
req is null

On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   struct nvme_completion *cqe = >cqes[idx];
>   struct request *req;
>  
> - if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> - dev_warn(nvmeq->dev->ctrl.device,
> - "invalid id %d completed on queue %d\n",
> - cqe->command_id, le16_to_cpu(cqe->sq_id));
> - return;
> - }
> -
>   /*
>* AEN requests are special as they don't time out and can
>* survive any kind of queue freeze and often don't respond to @@ 
> -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   }
>  
>   req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> + if (unlikely(!req)) {
> + dev_warn(nvmeq->dev->ctrl.device,
> + "req is null for tag %d completed on queue %d\n",
> + cqe->command_id, le16_to_cpu(cqe->sq_id));
> + return;
> + }

This is making sense now, though I think we should retain the existing
dev_warn() since it's still accurate and provides continuity for people who are 
used to looking for these sorts of messages.

Your changelog is a bit much though. I think we can say it a bit more 
succinctly. This is what I'm thinking:

  The driver registers interrupts for queues before initializing the
  tagset because it uses the number of successful request_irq() calls
  to configure the tagset parameters. This allows a race condition with
  the current tag validity check if the controller happens to produce
  an interrupt with a corrupted CQE before the tagset is initialized.

  Replace the driver's indirect tag check with the one already provided
  by the block layer.


撤回: [PATCH] [v2] nvme: use correct upper limit for tag in nvme_handle_cqe()

2020-09-20 Thread Tianxianting
tianxianting (RD) 将撤回邮件“[PATCH] [v2] nvme: use correct upper limit for tag in 
nvme_handle_cqe()”。

RE: [PATCH] [v2] nvme: use correct upper limit for tag in nvme_handle_cqe()

2020-09-20 Thread Tianxianting
Hi Keith,
I found an extreme case, 
in function blk_mq_alloc_map_and_requests(), it will adjust tagset depth by 
'set->queue_depth >>= 1' if there is no enough memory for rqs. 
If this happens, the real available number of tags(nr_tags) is much smaller 
than nvmeq->q_depth.
So the judgement "if (unlikely(cqe->command_id >= nvmeq->q_depth))" in 
nvme_handle_cqe() is really
meaningless.

I figured out a new patch, which replaces the meaningless judgement by checking 
whether the returned
req is null, if it is null, directly return.

Would you please spare several minutes to review below new patch? thanks
https://lkml.org/lkml/2020/9/20/400

-----Original Message-
From: tianxianting (RD) 
Sent: Saturday, September 19, 2020 11:15 AM
To: 'Keith Busch' 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] [v2] nvme: use correct upper limit for tag in 
nvme_handle_cqe()

Hi Keith,
Thanks a lot for your comments,
I will try to figure out a safe fix for this issue, then for you review:) 

-Original Message-
From: Keith Busch [mailto:kbu...@kernel.org] 
Sent: Saturday, September 19, 2020 3:21 AM
To: tianxianting (RD) 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] nvme: use correct upper limit for tag in 
nvme_handle_cqe()

On Fri, Sep 18, 2020 at 06:44:20PM +0800, Xianting Tian wrote:
> @@ -940,7 +940,9 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   struct nvme_completion *cqe = >cqes[idx];
>   struct request *req;
>  
> - if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> + if (unlikely(cqe->command_id >=
> + nvmeq->qid ? nvmeq->dev->tagset.queue_depth :
> + nvmeq->dev->admin_tagset.queue_depth)) {

Both of these values are set before blk_mq_alloc_tag_set(), so you still have a 
race. The interrupt handler probably just shouldn't be registered with the 
queue before the tagset is initialized since there can't be any work for the 
handler to do before that happens anyway.

The controller is definitely broken, though, and will lead to unavoidable 
corruption if it's really behaving this way.


RE: [PATCH] [v2] nvme: use correct upper limit for tag in nvme_handle_cqe()

2020-09-20 Thread Tianxianting
Hi,
I test and get the init flow of nvme admin queue and io queue in kernel 5.6,
Currently, the code use nvmeq->q_depth as the upper limit for tag in 
nvme_handle_cqe(), according to below init flow, we already have the race 
currently.

Admin queue init flow:
1, set nvmeq->q_depth 32 for admin queue;
2, register irq handler(nvme_irq) for admin queue 0;
3, set admin_tagset.queue_depth to 30 and alloc rqs;
4, nvme irq happen on admin queue;

IO queue init flow:
1, set nvmeq->q_depth 1024 for io queue 1~16;
2, register irq handler(nvme_irq) for io queue 1~16;
3, set tagset.queue_depth to 1023 and alloc rqs;
4, nvme irq happen on io queue;

So we have two issues need to fix:
1, register interrupt handler(nvme_irq) after tagset init(step 3 above) done to 
avoid a race;
2, use correct upper limit(queue_depth in tagset) for tag in nvme_handle_cqe(), 
which is the issue that will be solved in this patch I submitted.

Is it right? Thanks a lot.

-Original Message-----
From: tianxianting (RD) 
Sent: Saturday, September 19, 2020 11:15 AM
To: 'Keith Busch' 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] [v2] nvme: use correct upper limit for tag in 
nvme_handle_cqe()

Hi Keith,
Thanks a lot for your comments,
I will try to figure out a safe fix for this issue, then for you review:) 

-Original Message-
From: Keith Busch [mailto:kbu...@kernel.org] 
Sent: Saturday, September 19, 2020 3:21 AM
To: tianxianting (RD) 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] nvme: use correct upper limit for tag in 
nvme_handle_cqe()

On Fri, Sep 18, 2020 at 06:44:20PM +0800, Xianting Tian wrote:
> @@ -940,7 +940,9 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   struct nvme_completion *cqe = >cqes[idx];
>   struct request *req;
>  
> - if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> + if (unlikely(cqe->command_id >=
> + nvmeq->qid ? nvmeq->dev->tagset.queue_depth :
> + nvmeq->dev->admin_tagset.queue_depth)) {

Both of these values are set before blk_mq_alloc_tag_set(), so you still have a 
race. The interrupt handler probably just shouldn't be registered with the 
queue before the tagset is initialized since there can't be any work for the 
handler to do before that happens anyway.

The controller is definitely broken, though, and will lead to unavoidable 
corruption if it's really behaving this way.


RE: [PATCH] [v2] nvme: use correct upper limit for tag in nvme_handle_cqe()

2020-09-18 Thread Tianxianting
Hi Keith,
Thanks a lot for your comments,
I will try to figure out a safe fix for this issue, then for you review:) 

-Original Message-
From: Keith Busch [mailto:kbu...@kernel.org] 
Sent: Saturday, September 19, 2020 3:21 AM
To: tianxianting (RD) 
Cc: ax...@fb.com; h...@lst.de; s...@grimberg.me; 
linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] nvme: use correct upper limit for tag in 
nvme_handle_cqe()

On Fri, Sep 18, 2020 at 06:44:20PM +0800, Xianting Tian wrote:
> @@ -940,7 +940,9 @@ static inline void nvme_handle_cqe(struct nvme_queue 
> *nvmeq, u16 idx)
>   struct nvme_completion *cqe = >cqes[idx];
>   struct request *req;
>  
> - if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> + if (unlikely(cqe->command_id >=
> + nvmeq->qid ? nvmeq->dev->tagset.queue_depth :
> + nvmeq->dev->admin_tagset.queue_depth)) {

Both of these values are set before blk_mq_alloc_tag_set(), so you still have a 
race. The interrupt handler probably just shouldn't be registered with the 
queue before the tagset is initialized since there can't be any work for the 
handler to do before that happens anyway.

The controller is definitely broken, though, and will lead to unavoidable 
corruption if it's really behaving this way.


RE: [PATCH] nvme: use correct upper limit for tag in nvme_handle_cqe()

2020-09-18 Thread Tianxianting
I am very sorry,
Please ignore this patch, the code need to rewrite.

-Original Message-
From: tianxianting (RD) 
Sent: Friday, September 18, 2020 3:45 PM
To: kbu...@kernel.org; ax...@fb.com; h...@lst.de; s...@grimberg.me
Cc: linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org; tianxianting 
(RD) 
Subject: [PATCH] nvme: use correct upper limit for tag in nvme_handle_cqe()

We met a crash issue when hot-insert a nvme device, blk_mq_tag_to_rq() returned 
null(req=null), then crash happened in nvme_end_request():
req = blk_mq_tag_to_rq();
struct nvme_request *rq = nvme_req(req); //rq = req + 1
rq->result = result;  <==crash here!!!

[ 1124.256246] nvme nvme5: pci function :e1:00.0
[ 1124.256323] nvme :e1:00.0: enabling device ( -> 0002)
[ 1125.720859] nvme nvme5: 96/0/0 default/read/poll queues
[ 1125.732483]  nvme5n1: p1 p2 p3
[ 1125.788049] BUG: unable to handle kernel NULL pointer dereference at 
0130
[ 1125.788054] PGD 0 P4D 0
[ 1125.788057] Oops: 0002 [#1] SMP NOPTI
[ 1125.788059] CPU: 50 PID: 0 Comm: swapper/50 Kdump: loaded Tainted: G 
  --- -t - 4.18.0-147.el8.x86_64 #1
[ 1125.788065] RIP: 0010:nvme_irq+0xe8/0x240 [nvme]
[ 1125.788068] RSP: 0018:916b8ec83ed0 EFLAGS: 00010813
[ 1125.788069] RAX:  RBX: 918ae9211b00 RCX: 

[ 1125.788070] RDX: 400b RSI:  RDI: 

[ 1125.788071] RBP: 918ae887 R08: 0004 R09: 
918ae887
[ 1125.788072] R10:  R11:  R12: 

[ 1125.788073] R13: 0001 R14:  R15: 
0001
[ 1125.788075] FS:  () GS:916b8ec8() 
knlGS:
[ 1125.788075] CS:  0010 DS:  ES:  CR0: 80050033
[ 1125.788076] CR2: 0130 CR3: 001768f0 CR4: 
00340ee0
[ 1125.788077] Call Trace:
[ 1125.788080]  
[ 1125.788085]  __handle_irq_event_percpu+0x40/0x180
[ 1125.788087]  handle_irq_event_percpu+0x30/0x80
[ 1125.788089]  handle_irq_event+0x36/0x53
[ 1125.788090]  handle_edge_irq+0x82/0x190
[ 1125.788094]  handle_irq+0xbf/0x100
[ 1125.788098]  do_IRQ+0x49/0xd0
[ 1125.788100]  common_interrupt+0xf/0xf

The analysis of the possible cause of above crash as below.

According to our test, in nvme_pci_enable(), 'dev->q_depth' is set to 1024, in 
nvme_create_io_queues()->nvme_alloc_queue(), 'nvmeq->q_depth' is set to 
'dev->q_depth'. In nvme_dev_add(), 'dev->tagset.queue_depth' is set to 1023:
 dev->tagset.queue_depth = min_t(unsigned int, dev->q_depth, 
BLK_MQ_MAX_DEPTH) - 1;
//why -1?? first involved by commit 
a4aea562 In nvme_alloc_queue(), 'nvmeq->q_depth' is set to dev->q_depth'.
So we got below values for mutiple depth:
dev->q_depth= 1024
dev->tagset.queue_depth = 1023
nvmeq->q_depth  = 1024

In blk_mq_alloc_rqs(), it allocated 1023(dev->tagset.queue_depth) rqs for one 
hw queue. In nvme_alloc_queue(), it allocated 1024(nvmeq->q_depth) cqes for 
nvmeq->cqes[]. When process cqe in nvme_handle_cqe(), it get it by:
struct nvme_completion *cqe = >cqes[idx];

We also added below prints in nvme_handle_cqe() and blk_mq_tag_to_rq():
static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
{
volatile struct nvme_completion *cqe = >cqes[idx];
struct request *req;

//debug print
dev_warn(nvmeq->dev->ctrl.device,
"command_id %d completed on queue %d, nvmeq q_depth %d, 
nvme tagset q_depth %d\n",
 cqe->command_id, le16_to_cpu(cqe->sq_id),
 nvmeq->q_depth, nvmeq->dev->tagset.queue_depth);

if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
dev_warn(nvmeq->dev->ctrl.device,
"invalid id %d completed on queue %d\n",
cqe->command_id, le16_to_cpu(cqe->sq_id));
return;
}

... ...

req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), 
cqe->command_id);
... ...
}

struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int 
tag)
{
//debug print
printk("tag, nr_tags:%d %d\n", tag, tags->nr_tags);
if (tag < tags->nr_tags) {
prefetch(tags->rqs[tag]);
return tags->rqs[tag];

RE: [PATCH] tracing: use __this_cpu_read() in trace_buffered_event_enable()

2020-09-17 Thread Tianxianting
Hi Steven,
Thanks for your reply:)

-Original Message-
From: Steven Rostedt [mailto:rost...@goodmis.org] 
Sent: Friday, September 18, 2020 8:31 AM
To: tianxianting (RD) 
Cc: mi...@redhat.com; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] tracing: use __this_cpu_read() in 
trace_buffered_event_enable()

Sorry for the late reply (been busy with Linux Plumbers, took a vacation, and 
then trying to catch up on all my email for the last two
months!)

But I just wanted to let you know that I added this to my queue.

Thanks!

-- Steve


On Thu, 13 Aug 2020 19:28:03 +0800
Xianting Tian  wrote:

> The code is executed with preemption disabled, so it's safe to use 
> __this_cpu_read().
> 
> Signed-off-by: Xianting Tian 
> ---
>  kernel/trace/trace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 
> bb6226972..7d0d71ce9 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2514,7 +2514,7 @@ void trace_buffered_event_enable(void)
>  
>   preempt_disable();
>   if (cpu == smp_processor_id() &&
> - this_cpu_read(trace_buffered_event) !=
> + __this_cpu_read(trace_buffered_event) !=
>   per_cpu(trace_buffered_event, cpu))
>   WARN_ON_ONCE(1);
>   preempt_enable();



RE: [PATCH] ipmi: add retry in try_get_dev_id()

2020-09-16 Thread Tianxianting
Thanks you Corey for your kindly guides to me for these three patches :)

-Original Message-
From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey Minyard
Sent: Wednesday, September 16, 2020 10:01 PM
To: tianxianting (RD) 
Cc: a...@arndb.de; gre...@linuxfoundation.org; 
openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ipmi: add retry in try_get_dev_id()

On Wed, Sep 16, 2020 at 02:21:29PM +0800, Xianting Tian wrote:
> Use retry machanism to give device more opportunitys to correctly 
> response kernel when we received specific completion codes.
> 
> This is similar to what we done in __get_device_id().

Thanks.  I moved GET_DEVICE_ID_MAX_RETRY to include/linux/ipmi.h since uapi is 
for things exported to userspace.  But this is good, it's in my next tree.

-corey

> 
> Signed-off-by: Xianting Tian 
> ---
>  drivers/char/ipmi/ipmi_msghandler.c |  2 --
>  drivers/char/ipmi/ipmi_si_intf.c| 17 +
>  include/uapi/linux/ipmi.h   |  2 ++
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
> b/drivers/char/ipmi/ipmi_msghandler.c
> index b9685093e..75cb7e062 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -62,8 +62,6 @@ enum ipmi_panic_event_op {  #define 
> IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE  #endif
>  
> -#define GET_DEVICE_ID_MAX_RETRY  5
> -
>  static enum ipmi_panic_event_op ipmi_send_panic_event = 
> IPMI_PANIC_DEFAULT;
>  
>  static int panic_op_write_handler(const char *val, diff --git 
> a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
> index 77b8d551a..beeb705f1 100644
> --- a/drivers/char/ipmi/ipmi_si_intf.c
> +++ b/drivers/char/ipmi/ipmi_si_intf.c
> @@ -1316,6 +1316,7 @@ static int try_get_dev_id(struct smi_info *smi_info)
>   unsigned char *resp;
>   unsigned long resp_len;
>   int   rv = 0;
> + unsigned int  retry_count = 0;
>  
>   resp = kmalloc(IPMI_MAX_MSG_LENGTH, GFP_KERNEL);
>   if (!resp)
> @@ -1327,6 +1328,8 @@ static int try_get_dev_id(struct smi_info *smi_info)
>*/
>   msg[0] = IPMI_NETFN_APP_REQUEST << 2;
>   msg[1] = IPMI_GET_DEVICE_ID_CMD;
> +
> +retry:
>   smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2);
>  
>   rv = wait_for_msg_done(smi_info);
> @@ -1339,6 +1342,20 @@ static int try_get_dev_id(struct smi_info *smi_info)
>   /* Check and record info from the get device id, in case we need it. */
>   rv = ipmi_demangle_device_id(resp[0] >> 2, resp[1],
>   resp + 2, resp_len - 2, _info->device_id);
> + if (rv) {
> + /* record completion code */
> + char cc = *(resp + 2);
> +
> + if ((cc == IPMI_DEVICE_IN_FW_UPDATE_ERR
> + || cc == IPMI_DEVICE_IN_INIT_ERR
> + || cc == IPMI_NOT_IN_MY_STATE_ERR)
> + && ++retry_count <= GET_DEVICE_ID_MAX_RETRY) {
> + dev_warn(smi_info->io.dev,
> + "retry to get device id as completion code 
> 0x%x\n",
> +  cc);
> + goto retry;
> + }
> + }
>  
>  out:
>   kfree(resp);
> diff --git a/include/uapi/linux/ipmi.h b/include/uapi/linux/ipmi.h 
> index 32d148309..bc57f07e3 100644
> --- a/include/uapi/linux/ipmi.h
> +++ b/include/uapi/linux/ipmi.h
> @@ -426,4 +426,6 @@ struct ipmi_timing_parms {
>  #define IPMICTL_GET_MAINTENANCE_MODE_CMD _IOR(IPMI_IOC_MAGIC, 30, int)
>  #define IPMICTL_SET_MAINTENANCE_MODE_CMD _IOW(IPMI_IOC_MAGIC, 31, int)
>  
> +#define GET_DEVICE_ID_MAX_RETRY  5
> +
>  #endif /* _UAPI__LINUX_IPMI_H */
> --
> 2.17.1
> 


RE: [PATCH] [v2] ipmi: retry to get device id when error

2020-09-16 Thread Tianxianting
Hi Corey,
Would you please review this patch: add the same retry in try_get_dev_id()
https://lkml.org/lkml/2020/9/16/244

Thanks a lot.

-Original Message-
From: tianxianting (RD) 
Sent: Tuesday, September 15, 2020 11:20 PM
To: 'miny...@acm.org' 
Cc: a...@arndb.de; gre...@linuxfoundation.org; 
openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] [v2] ipmi: retry to get device id when error

I get it now, thank you Corey :)
I will send the patch for you review tomorrow.

-Original Message-
From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey Minyard
Sent: Tuesday, September 15, 2020 10:53 PM
To: tianxianting (RD) 
Cc: a...@arndb.de; gre...@linuxfoundation.org; 
openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] ipmi: retry to get device id when error

On Tue, Sep 15, 2020 at 09:40:02AM +, Tianxianting wrote:
> Hi Corey,
> Thanks for your comments,
> Please review these two patches, which are based on your guide.
> 1, [PATCH] ipmi: print current state when error
> https://lkml.org/lkml/2020/9/15/183
> 2, [PATCH] [v3] ipmi: retry to get device id when error
> https://lkml.org/lkml/2020/9/15/156

Patches are applied and in my next tree.

> 
> As you said "You are having the same issue in the ipmi_si code. It's choosing 
> defaults, but that's not ideal.  You probably need to handle this there, too, 
> in a separate patch."
> I am not sure whether I grasped what you said, The print ' device id 
> demangle failed: -22' in commit message, is just triggered by 
> bmc_device_id_handler->ipmi_demangle_device_id, this is the issue we met and 
> is solving.
> I found try_get_dev_id(in drivers/char/ipmi/ipmi_si_intf.c) also called 
> ipmi_demangle_device_id(), do you mean if this ipmi_demangle_device_id() 
> returned error, we also need to retry?

Yes, I think so, retrying in try_get_dev_id() would be a good idea, I think.  
You are probably getting sub-optimal performance if you don't do this.

Thanks,

-corey

> 
> Thanks a lot.
> 
> -Original Message-
> From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey 
> Minyard
> Sent: Monday, September 14, 2020 11:40 PM
> To: tianxianting (RD) 
> Cc: a...@arndb.de; gre...@linuxfoundation.org; 
> openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] [v2] ipmi: retry to get device id when error
> 
> On Mon, Sep 14, 2020 at 04:13:13PM +0800, Xianting Tian wrote:
> > We can't get bmc's device id with low probability when loading ipmi 
> > driver, it caused bmc device register failed. When this issue 
> > happened, we got below kernel printks:
> 
> This patch is moving in the right direction.  For the final patch(es), I can 
> clean up the english grammar issues, since that's not your native language.  
> A few comments:
> 
> > [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: 
> > device id demangle failed: -22
> 
> You are having the same issue in the ipmi_si code.  It's choosing defaults, 
> but that's not ideal.  You probably need to handle this there, too, in a 
> separate patch.
> 
> Can you create a separate patch to add a dev_warn() to the BT code when it 
> returns IPMI_NOT_IN_MY_STATE_ERR, like I asked previously?  And print the 
> current state when it happens.  That way we know where this issue is coming 
> from and possibly fix the state machine.  I'm thinking that the BMC is just 
> not responding, but I'd like to be sure.
> 
> Other comments inline...
> 
> > [Wed Sep  9 19:52:03 2020] IPMI BT: using default values
> > [Wed Sep  9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
> > [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device 
> > id: -5
> > [Wed Sep  9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register
> > device: error -5
> > 
> > When this issue happened, we want to manually unload the driver and 
> > try to load it again, but it can't be unloaded by 'rmmod' as it is already 
> > 'in use'.
> > 
> > We add below 'printk' in handle_one_recv_msg(), when this issue 
> > happened, the msg we received is "Recv: 1c 01 d5", which means the 
> > data_len is 1, data[0] is 0xd5(completion code), which means "bmc cannot 
> > execute command.
> > Command, or request parameter(s), not supported in present state".
> > Debug code:
> > static int handle_one_recv_msg(struct ipmi_smi *intf,
> >struct ipmi_smi_msg *msg) {
> > printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
> > ... ...
> > }
> > Then 

RE: [PATCH] [v2] ipmi: retry to get device id when error

2020-09-15 Thread Tianxianting
I get it now, thank you Corey :)
I will send the patch for you review tomorrow.

-Original Message-
From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey Minyard
Sent: Tuesday, September 15, 2020 10:53 PM
To: tianxianting (RD) 
Cc: a...@arndb.de; gre...@linuxfoundation.org; 
openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] ipmi: retry to get device id when error

On Tue, Sep 15, 2020 at 09:40:02AM +, Tianxianting wrote:
> Hi Corey,
> Thanks for your comments,
> Please review these two patches, which are based on your guide.
> 1, [PATCH] ipmi: print current state when error
> https://lkml.org/lkml/2020/9/15/183
> 2, [PATCH] [v3] ipmi: retry to get device id when error
> https://lkml.org/lkml/2020/9/15/156

Patches are applied and in my next tree.

> 
> As you said "You are having the same issue in the ipmi_si code. It's choosing 
> defaults, but that's not ideal.  You probably need to handle this there, too, 
> in a separate patch."
> I am not sure whether I grasped what you said, The print ' device id 
> demangle failed: -22' in commit message, is just triggered by 
> bmc_device_id_handler->ipmi_demangle_device_id, this is the issue we met and 
> is solving.
> I found try_get_dev_id(in drivers/char/ipmi/ipmi_si_intf.c) also called 
> ipmi_demangle_device_id(), do you mean if this ipmi_demangle_device_id() 
> returned error, we also need to retry?

Yes, I think so, retrying in try_get_dev_id() would be a good idea, I think.  
You are probably getting sub-optimal performance if you don't do this.

Thanks,

-corey

> 
> Thanks a lot.
> 
> -Original Message-
> From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey 
> Minyard
> Sent: Monday, September 14, 2020 11:40 PM
> To: tianxianting (RD) 
> Cc: a...@arndb.de; gre...@linuxfoundation.org; 
> openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] [v2] ipmi: retry to get device id when error
> 
> On Mon, Sep 14, 2020 at 04:13:13PM +0800, Xianting Tian wrote:
> > We can't get bmc's device id with low probability when loading ipmi 
> > driver, it caused bmc device register failed. When this issue 
> > happened, we got below kernel printks:
> 
> This patch is moving in the right direction.  For the final patch(es), I can 
> clean up the english grammar issues, since that's not your native language.  
> A few comments:
> 
> > [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: 
> > device id demangle failed: -22
> 
> You are having the same issue in the ipmi_si code.  It's choosing defaults, 
> but that's not ideal.  You probably need to handle this there, too, in a 
> separate patch.
> 
> Can you create a separate patch to add a dev_warn() to the BT code when it 
> returns IPMI_NOT_IN_MY_STATE_ERR, like I asked previously?  And print the 
> current state when it happens.  That way we know where this issue is coming 
> from and possibly fix the state machine.  I'm thinking that the BMC is just 
> not responding, but I'd like to be sure.
> 
> Other comments inline...
> 
> > [Wed Sep  9 19:52:03 2020] IPMI BT: using default values
> > [Wed Sep  9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
> > [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device 
> > id: -5
> > [Wed Sep  9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register
> > device: error -5
> > 
> > When this issue happened, we want to manually unload the driver and 
> > try to load it again, but it can't be unloaded by 'rmmod' as it is already 
> > 'in use'.
> > 
> > We add below 'printk' in handle_one_recv_msg(), when this issue 
> > happened, the msg we received is "Recv: 1c 01 d5", which means the 
> > data_len is 1, data[0] is 0xd5(completion code), which means "bmc cannot 
> > execute command.
> > Command, or request parameter(s), not supported in present state".
> > Debug code:
> > static int handle_one_recv_msg(struct ipmi_smi *intf,
> >struct ipmi_smi_msg *msg) {
> > printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
> > ... ...
> > }
> > Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
> > and 'data[0] != 0'.
> > 
> > We used this patch to retry to get device id when error happen, we 
> > reproduced this issue again and the retry succeed on the first 
> > retry, we finally got the correct msg and then all is ok:
> > Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00
> > 
> > So use retry machanism in 

RE: [PATCH] [v2] ipmi: retry to get device id when error

2020-09-15 Thread Tianxianting
Hi Corey,
Thanks for your comments,
Please review these two patches, which are based on your guide.
1, [PATCH] ipmi: print current state when error
https://lkml.org/lkml/2020/9/15/183 
2, [PATCH] [v3] ipmi: retry to get device id when error
https://lkml.org/lkml/2020/9/15/156 

As you said "You are having the same issue in the ipmi_si code. It's choosing 
defaults, but that's not ideal.  You probably need to handle this there, too, 
in a separate patch."
I am not sure whether I grasped what you said, 
The print ' device id demangle failed: -22' in commit message, is just 
triggered by bmc_device_id_handler->ipmi_demangle_device_id, this is the issue 
we met and is solving.
I found try_get_dev_id(in drivers/char/ipmi/ipmi_si_intf.c) also called 
ipmi_demangle_device_id(), do you mean if this ipmi_demangle_device_id() 
returned error, we also need to retry?

Thanks a lot.

-Original Message-
From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey Minyard
Sent: Monday, September 14, 2020 11:40 PM
To: tianxianting (RD) 
Cc: a...@arndb.de; gre...@linuxfoundation.org; 
openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] ipmi: retry to get device id when error

On Mon, Sep 14, 2020 at 04:13:13PM +0800, Xianting Tian wrote:
> We can't get bmc's device id with low probability when loading ipmi 
> driver, it caused bmc device register failed. When this issue 
> happened, we got below kernel printks:

This patch is moving in the right direction.  For the final patch(es), I can 
clean up the english grammar issues, since that's not your native language.  A 
few comments:

>   [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: 
> device id demangle failed: -22

You are having the same issue in the ipmi_si code.  It's choosing defaults, but 
that's not ideal.  You probably need to handle this there, too, in a separate 
patch.

Can you create a separate patch to add a dev_warn() to the BT code when it 
returns IPMI_NOT_IN_MY_STATE_ERR, like I asked previously?  And print the 
current state when it happens.  That way we know where this issue is coming 
from and possibly fix the state machine.  I'm thinking that the BMC is just not 
responding, but I'd like to be sure.

Other comments inline...

>   [Wed Sep  9 19:52:03 2020] IPMI BT: using default values
>   [Wed Sep  9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
>   [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device 
> id: -5
>   [Wed Sep  9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register 
> device: error -5
> 
> When this issue happened, we want to manually unload the driver and 
> try to load it again, but it can't be unloaded by 'rmmod' as it is already 
> 'in use'.
> 
> We add below 'printk' in handle_one_recv_msg(), when this issue 
> happened, the msg we received is "Recv: 1c 01 d5", which means the 
> data_len is 1, data[0] is 0xd5(completion code), which means "bmc cannot 
> execute command.
> Command, or request parameter(s), not supported in present state".
>   Debug code:
>   static int handle_one_recv_msg(struct ipmi_smi *intf,
>struct ipmi_smi_msg *msg) {
>   printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
>   ... ...
>   }
> Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
> and 'data[0] != 0'.
> 
> We used this patch to retry to get device id when error happen, we 
> reproduced this issue again and the retry succeed on the first retry, 
> we finally got the correct msg and then all is ok:
> Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00
> 
> So use retry machanism in this patch to give bmc more opportunity to 
> correctly response kernel when we received specific completion code.
> 
> Signed-off-by: Xianting Tian 
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 29 +
>  include/uapi/linux/ipmi_msgdefs.h   |  2 ++
>  2 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
> b/drivers/char/ipmi/ipmi_msghandler.c
> index 737c0b6b2..07d5be2cd 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IPMI_DRIVER_VERSION "39.2"
>  
> @@ -60,6 +61,9 @@ enum ipmi_panic_event_op {  #else  #define 
> IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE  #endif
> +
> +#define GET_DEVICE_ID_MAX_RETRY  5
> +
>  static enum ipmi_panic_event_op ipmi_send_panic_event = 
> IPMI_PANIC_DEFAULT;
>  
>  static int panic_op_write_handler(const char *val, @@ -317,6 +321,7 
> @@ st

RE: [PATCH] ipmi: retry to get device id when error

2020-09-14 Thread Tianxianting
Hi Corey,
Thanks a lot for you kindly guide.

I send the version 2 patch to you: [PATCH] [v2] ipmi: retry to get device id 
when error
If there is still something wrong or I missed something, please correct me:)

-Original Message-
From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey Minyard
Sent: Monday, September 14, 2020 1:26 AM
To: tianxianting (RD) 
Cc: a...@arndb.de; gre...@linuxfoundation.org; 
openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ipmi: retry to get device id when error

On Sun, Sep 13, 2020 at 02:10:01PM +, Tianxianting wrote:
> Hi Corey
> Thanks for your quickly reply,
> We didn't try the method you mentioned, actually, I didn't know it 
> before you told me:( The issue ever occurred on our 2 ceph storage server 
> both with low probability.
> We finally use this patch to solve the issue, it can automatically solve the 
> issue when it happened. So no need to judge and reload ipmi driver manually 
> or develop additional scripts to make it.
> The 1 second delay is acceptable to us.
> 
> If there really isn't a BMC out there, ipmi driver will not be loaded, is it 
> right?

No, there are systems that have IPMI controllers that are specified in the 
ACPI/SMBIOS tables but have no IPMI controller.

> 
> May be we can adjust to retry 3 times with 500ms interval? 

Maybe there is another way.  What I'm guessing is happening is not that the 
interface is lossy, but that the BMC is in the middle of a reset or an upgrade. 
 The D5 completion code means: Cannot execute command. Command, or request 
parameter(s), not supported in present state.

That error is also returned from bt_start_transaction() in the driver if the 
driver is not in the right state when starting a transaction, which may point 
to a bug in the BT state machine.  Search for IPMI_NOT_IN_MY_STATE_ERR in 
drivers/char/ipmi/ipmi_bt_sm.c.

If it's coming fron the state machine, it would be handy to know what state it 
was in when the error happened to help trace the bug down.
That's what I would suggest first.  Fix the fundamental bug, if you can.
a pr_warn() added to the code there would be all that's needed, and thats 
probably a good permanent addition.

I would be ok with a patch that retried some number of times if it got a
D5 completion code.  That wouldn't have any effect on other systems.
Plus you could add a D1 and D2 completion code, which are similar things.  Add 
the new completion codes to include/uapi/linux/ipmi_msgdefs.h and implement the 
retry.  That makes sense from a general point of view.

Thanks,

-corey

> 
> Thanks in advance if you can feedback again.
> 
> -Original Message-
> From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey 
> Minyard
> Sent: Sunday, September 13, 2020 8:40 PM
> To: tianxianting (RD) 
> Cc: a...@arndb.de; gre...@linuxfoundation.org; 
> openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] ipmi: retry to get device id when error
> 
> On Sun, Sep 13, 2020 at 08:02:03PM +0800, Xianting Tian wrote:
> > We can't get bmc's device id with low probability when loading ipmi 
> > driver, it caused bmc device register failed. This issue may caused 
> > by bad lpc signal quality. When this issue happened, we got below kernel 
> > printks:
> > [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: 
> > device id demangle failed: -22
> > [Wed Sep  9 19:52:03 2020] IPMI BT: using default values
> > [Wed Sep  9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
> > [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device 
> > id: -5
> > [Wed Sep  9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register
> > device: error -5
> > 
> > When this issue happened, we want to manually unload the driver and 
> > try to load it again, but it can't be unloaded by 'rmmod' as it is already 
> > 'in use'.
> 
> I'm not sure this patch is a good idea; it would cause a long boot delay in 
> situations where there really isn't a BMC out there.  Yes, it happens.
> 
> You don't have to reload the driver to add a device, though.  You can hot-add 
> devices using /sys/modules/ipmi_si/parameters/hotmod.  Look in 
> Documentation/driver-api/ipmi.rst for details.
> 
> Does that work for you?
> 
> -corey
> 
> > 
> > We add below 'printk' in handle_one_recv_msg(), when this issue 
> > happened, the msg we received is "Recv: 1c 01 d5", which means the 
> > data_len is 1, data[0] is 0xd5.
> > Debug code:
> > static int handle_one_recv_msg(struct ipmi_smi *intf,
> >struct ipmi_smi_msg *msg) {
> > printk("Recv: %*ph\n", msg-

RE: [PATCH] ipmi: retry to get device id when error

2020-09-13 Thread Tianxianting
Hi Corey
Thanks for your quickly reply,
We didn't try the method you mentioned, actually, I didn't know it before you 
told me:(
The issue ever occurred on our 2 ceph storage server both with low probability.
We finally use this patch to solve the issue, it can automatically solve the 
issue when it happened. So no need to judge and reload ipmi driver manually or 
develop additional scripts to make it.
The 1 second delay is acceptable to us.

If there really isn't a BMC out there, ipmi driver will not be loaded, is it 
right?

May be we can adjust to retry 3 times with 500ms interval? 

Thanks in advance if you can feedback again.

-Original Message-
From: Corey Minyard [mailto:tcminy...@gmail.com] On Behalf Of Corey Minyard
Sent: Sunday, September 13, 2020 8:40 PM
To: tianxianting (RD) 
Cc: a...@arndb.de; gre...@linuxfoundation.org; 
openipmi-develo...@lists.sourceforge.net; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ipmi: retry to get device id when error

On Sun, Sep 13, 2020 at 08:02:03PM +0800, Xianting Tian wrote:
> We can't get bmc's device id with low probability when loading ipmi 
> driver, it caused bmc device register failed. This issue may caused by 
> bad lpc signal quality. When this issue happened, we got below kernel printks:
>   [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: 
> device id demangle failed: -22
>   [Wed Sep  9 19:52:03 2020] IPMI BT: using default values
>   [Wed Sep  9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
>   [Wed Sep  9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device 
> id: -5
>   [Wed Sep  9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register 
> device: error -5
> 
> When this issue happened, we want to manually unload the driver and 
> try to load it again, but it can't be unloaded by 'rmmod' as it is already 
> 'in use'.

I'm not sure this patch is a good idea; it would cause a long boot delay in 
situations where there really isn't a BMC out there.  Yes, it happens.

You don't have to reload the driver to add a device, though.  You can hot-add 
devices using /sys/modules/ipmi_si/parameters/hotmod.  Look in 
Documentation/driver-api/ipmi.rst for details.

Does that work for you?

-corey

> 
> We add below 'printk' in handle_one_recv_msg(), when this issue 
> happened, the msg we received is "Recv: 1c 01 d5", which means the 
> data_len is 1, data[0] is 0xd5.
>   Debug code:
>   static int handle_one_recv_msg(struct ipmi_smi *intf,
>struct ipmi_smi_msg *msg) {
>   printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
>   ... ...
>   }
> Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
> and 'data[0] != 0'.
> 
> We used this patch to retry to get device id when error happen, we 
> reproduced this issue again and the retry succeed on the first retry, 
> we finally got the correct msg and then all is ok:
> Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00
> 
> So use retry machanism in this patch to give bmc more opportunity to 
> correctly response kernel.
> 
> Signed-off-by: Xianting Tian 
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
> b/drivers/char/ipmi/ipmi_msghandler.c
> index 737c0b6b2..bfb2de77a 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IPMI_DRIVER_VERSION "39.2"
>  
> @@ -60,6 +61,9 @@ enum ipmi_panic_event_op {  #else  #define 
> IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE  #endif
> +
> +#define GET_DEVICE_ID_MAX_RETRY  5
> +
>  static enum ipmi_panic_event_op ipmi_send_panic_event = 
> IPMI_PANIC_DEFAULT;
>  
>  static int panic_op_write_handler(const char *val, @@ -2426,19 
> +2430,26 @@ send_get_device_id_cmd(struct ipmi_smi *intf)  static int 
> __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc)  {
>   int rv;
> -
> - bmc->dyn_id_set = 2;
> + unsigned int retry_count = 0;
>  
>   intf->null_user_handler = bmc_device_id_handler;
>  
> +retry:
> + bmc->dyn_id_set = 2;
> +
>   rv = send_get_device_id_cmd(intf);
>   if (rv)
>   return rv;
>  
>   wait_event(intf->waitq, bmc->dyn_id_set != 2);
>  
> - if (!bmc->dyn_id_set)
> + if (!bmc->dyn_id_set) {
> + msleep(1000);
> + if (++retry_count <= GET_DEVICE_ID_MAX_RETRY)
> + goto retry;
> +
>   rv = -EIO; /* Something went wrong in the fetch. */
> + }
>  
>   /* dyn_id_set makes the id data available. */
>   smp_rmb();
> --
> 2.17.1
> 


RE: [PATCH] fs: use correct parameter in notes of generic_file_llseek_size()

2020-09-10 Thread Tianxianting
Hi viro,
Could I get your feedback?
This patch fixed the build warning, I think it can be applied, thanks :) 

-Original Message-
From: tianxianting (RD) 
Sent: Saturday, September 05, 2020 3:15 PM
To: v...@zeniv.linux.org.uk
Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; tianxianting 
(RD) 
Subject: [PATCH] fs: use correct parameter in notes of 
generic_file_llseek_size()

Fix warning when compiling with W=1:
fs/read_write.c:88: warning: Function parameter or member 'maxsize' not 
described in 'generic_file_llseek_size'
fs/read_write.c:88: warning: Excess function parameter 'size' description in 
'generic_file_llseek_size'

Signed-off-by: Xianting Tian 
---
 fs/read_write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c7..058563ee2 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -71,7 +71,7 @@ EXPORT_SYMBOL(vfs_setpos);
  * @file:  file structure to seek on
  * @offset:file offset to seek to
  * @whence:type of seek
- * @size:  max size of this file in file system
+ * @maxsize:   max size of this file in file system
  * @eof:   offset used for SEEK_END position
  *
  * This is a variant of generic_file_llseek that allows passing in a custom
-- 
2.17.1



RE: [PATCH] block: remove redundant empty check of mq_list

2020-09-10 Thread Tianxianting
Hi Jens,
Thanks for your feedback,
Yes, blk_flush_plug_list() is only caller of blk_mq_flush_plug_list().
So I checked the callers of blk_flush_plug_list(), found below code path will 
call blk_flush_plug_list():
io_schedule_prepare/sched_submit_work->blk_schedule_flush_plug
writeback_sb_inodes->blk_flush_plug
blk_finish_plug
dm_submit_bio/__submit_bio_noacct_mq/__submit_bio->blk_mq_submit_bio
blk_poll

So I think there are still many chances to do the redundant judge?

-Original Message-
From: Jens Axboe [mailto:ax...@kernel.dk] 
Sent: Wednesday, September 09, 2020 10:21 PM
To: tianxianting (RD) ; a...@kernel.org; 
dan...@iogearbox.net; ka...@fb.com; songliubrav...@fb.com; y...@fb.com; 
andr...@fb.com; john.fastab...@gmail.com; kpsi...@chromium.org
Cc: linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org; 
net...@vger.kernel.org; b...@vger.kernel.org
Subject: Re: [PATCH] block: remove redundant empty check of mq_list

On 9/9/20 12:48 AM, Xianting Tian wrote:
> blk_mq_flush_plug_list() itself will do the empty check of mq_list, so 
> remove such check in blk_flush_plug_list().
> Actually normally mq_list is not empty when blk_flush_plug_list is 
> called.

It's cheaper to do in the caller, instead of doing the function call and then 
aborting if it's empty. So I'd suggest just leaving it alone.
Right now this is the only caller, but it's nicer to assume we can be called in 
any state vs not having the check.

--
Jens Axboe



RE: [PATCH] blkcg: add plugging support for punt bio

2020-09-08 Thread Tianxianting
Thanks TJ and your previous guide to me.
I will summarize and resubmit the patch.

-Original Message-
From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of Tejun Heo
Sent: Wednesday, September 09, 2020 2:03 AM
To: tianxianting (RD) 
Cc: ax...@kernel.dk; cgro...@vger.kernel.org; linux-bl...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] blkcg: add plugging support for punt bio

On Sat, Sep 05, 2020 at 11:25:03AM +, Tianxianting wrote:
> Hi jens,tj
> Could you share a couple of  minutes to comment it?
> I really appreciate it

The result looks fine to me but can you please summarize that in the commit 
message of the patch?

Thanks.

--
tejun


RE: [PATCH] clocksource: Return negative error code in h8300_8timer_init()

2020-09-02 Thread Tianxianting
Hi Markus,
Thanks for your comments.
No real fix, as normally , we should return a negative code,
In function h8300_8timer_init(),  it set ' ret = -EINVAL ', ' ret = ENXIO' , it 
should be align with a negative code

static int __init h8300_8timer_init(struct device_node *node)
{

ret = ENXIO;  <<<
base = of_iomap(node, 0);
if (!base) {
pr_err("failed to map registers for clockevent\n");
goto free_clk;
}

ret = -EINVAL;  <<<  
irq = irq_of_parse_and_map(node, 0);
if (!irq) {
pr_err("failed to get irq for clockevent\n");
goto unmap_reg;
}

-Original Message-
From: Markus Elfring [mailto:markus.elfr...@web.de] 
Sent: Tuesday, September 01, 2020 10:46 PM
To: tianxianting (RD) ; 
uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-kernel@vger.kernel.org; kernel-janit...@vger.kernel.org; Daniel 
Lezcano ; Thomas Gleixner ; 
Yoshinori Sato 
Subject: Re: [PATCH] clocksource: Return negative error code in 
h8300_8timer_init()

> A negative error code should be returned

* Can an other imperative wording become helpful for the change description?

* Would you like to add the tag “Fixes” to the commit message?

Regards,
Markus


RE: [PATCH] [v3] blk-mq: use BLK_MQ_NO_TAG for no tag

2020-09-01 Thread Tianxianting
Hi Jens
Could I get the feedback from you whether this patch can be applied? 
Thanks a lot.

-Original Message-
From: tianxianting (RD) 
Sent: Thursday, August 27, 2020 2:34 PM
To: ax...@kernel.dk
Cc: linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org; tianxianting 
(RD) 
Subject: [PATCH] [v3] blk-mq: use BLK_MQ_NO_TAG for no tag

Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.

Signed-off-by: Xianting Tian 
---
 block/blk-core.c | 4 ++--
 block/blk-mq-sched.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c index d9d632639..c7eaf7504 
100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -116,8 +116,8 @@ void blk_rq_init(struct request_queue *q, struct request 
*rq)
rq->__sector = (sector_t) -1;
INIT_HLIST_NODE(>hash);
RB_CLEAR_NODE(>rb_node);
-   rq->tag = -1;
-   rq->internal_tag = -1;
+   rq->tag = BLK_MQ_NO_TAG;
+   rq->internal_tag = BLK_MQ_NO_TAG;
rq->start_time_ns = ktime_get_ns();
rq->part = NULL;
refcount_set(>ref, 1);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 
a19cdf159..439481f59 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -522,7 +522,7 @@ void blk_mq_sched_insert_request(struct request *rq, bool 
at_head,
goto run;
}
 
-   WARN_ON(e && (rq->tag != -1));
+   WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG));
 
if (blk_mq_sched_bypass_insert(hctx, !!e, rq)) {
/*
--
2.17.1



RE: [PATCH] [v2] nvme-pci: check req to prevent crash in nvme_handle_cqe()

2020-09-01 Thread Tianxianting
Hi,
Could I get the feedback for the patch, whether it can be applied or need some 
improvement?
It really can prevent a crash we met.
Thanks:)

-Original Message-
From: tianxianting (RD) 
Sent: Monday, August 31, 2020 6:56 PM
To: kbu...@kernel.org; ax...@fb.com; h...@lst.de; s...@grimberg.me
Cc: linux-n...@lists.infradead.org; linux-kernel@vger.kernel.org; tianxianting 
(RD) 
Subject: [PATCH] [v2] nvme-pci: check req to prevent crash in nvme_handle_cqe()

We met a crash issue when hot-insert a nvme device, blk_mq_tag_to_rq() returned 
null(req=null), then crash happened in nvme_end_request():
struct nvme_request *rq = nvme_req(req);
rq->result = result;  <==crash here

The test env is, a server is configured with 2 backplanes, each backplane 
support 8 nvme devices, this crash happened when hot-insert a nvme device to 
the second backplane. We measured the signal, which is send out of cpu to ack 
nvme interrupt, the signal is very weak when it reached the second backplane, 
the device can't distinguish it as a ack signal. So it caused the device can't 
clear the interrupt flag.
After updating related driver, the signal sending out of cpu to the second 
backplane is good, the crash issue disappeared.

As blk_mq_tag_to_rq() may return null, so it should be check whether it is null 
before using it to prevent a crash.

[ 1124.256246] nvme nvme5: pci function :e1:00.0
[ 1124.256323] nvme :e1:00.0: enabling device ( -> 0002)
[ 1125.720859] nvme nvme5: 96/0/0 default/read/poll queues
[ 1125.732483]  nvme5n1: p1 p2 p3
[ 1125.788049] BUG: unable to handle kernel NULL pointer dereference at 
0130
[ 1125.788054] PGD 0 P4D 0
[ 1125.788057] Oops: 0002 [#1] SMP NOPTI
[ 1125.788059] CPU: 50 PID: 0 Comm: swapper/50 Kdump: loaded Tainted: G 
  --- -t - 4.18.0-147.el8.x86_64 #1
[ 1125.788065] RIP: 0010:nvme_irq+0xe8/0x240 [nvme]
[ 1125.788068] RSP: 0018:916b8ec83ed0 EFLAGS: 00010813
[ 1125.788069] RAX:  RBX: 918ae9211b00 RCX: 

[ 1125.788070] RDX: 400b RSI:  RDI: 

[ 1125.788071] RBP: 918ae887 R08: 0004 R09: 
918ae887
[ 1125.788072] R10:  R11:  R12: 

[ 1125.788073] R13: 0001 R14:  R15: 
0001
[ 1125.788075] FS:  () GS:916b8ec8() 
knlGS:
[ 1125.788075] CS:  0010 DS:  ES:  CR0: 80050033
[ 1125.788076] CR2: 0130 CR3: 001768f0 CR4: 
00340ee0
[ 1125.788077] Call Trace:
[ 1125.788080]  
[ 1125.788085]  __handle_irq_event_percpu+0x40/0x180
[ 1125.788087]  handle_irq_event_percpu+0x30/0x80
[ 1125.788089]  handle_irq_event+0x36/0x53
[ 1125.788090]  handle_edge_irq+0x82/0x190
[ 1125.788094]  handle_irq+0xbf/0x100
[ 1125.788098]  do_IRQ+0x49/0xd0
[ 1125.788100]  common_interrupt+0xf/0xf

Signed-off-by: Xianting Tian 
---
 drivers/nvme/host/pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 
ba725ae47..5f1c51a43 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -960,6 +960,13 @@ static inline void nvme_handle_cqe(struct nvme_queue 
*nvmeq, u16 idx)
}
 
req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
+   if (unlikely(!req)) {
+   dev_warn(nvmeq->dev->ctrl.device,
+   "req is null(tag:%d) on queue %d\n",
+   cqe->command_id, le16_to_cpu(cqe->sq_id));
+   return;
+   }
+
trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail);
if (!nvme_end_request(req, cqe->status, cqe->result))
nvme_pci_complete_rq(req);
--
2.17.1



RE: [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe()

2020-08-31 Thread Tianxianting
I am very sorry, I used the wrong patch file :(
I send it again. Please review, thanks.

-Original Message-
From: kernel test robot [mailto:l...@intel.com] 
Sent: Monday, August 31, 2020 5:41 PM
To: tianxianting (RD) ; kbu...@kernel.org; ax...@fb.com; 
h...@lst.de; s...@grimberg.me
Cc: kbuild-...@lists.01.org; linux-n...@lists.infradead.org; 
linux-kernel@vger.kernel.org; tianxianting (RD) 
Subject: Re: [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe()

Hi Xianting,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.9-rc2]
[also build test ERROR on next-20200828] [cannot apply to linus/master 
v5.9-rc3] [If your patch is applied to the wrong git tree, kindly drop us a 
note.
And when submitting patch, we suggest to use '--base' as documented in 
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Xianting-Tian/nvme-pci-check-req-to-prevent-crash-in-nvme_handle_cqe/20200831-155653
base:d012a7190fc1fd72ed48911e77ca97ba4521bccd
config: parisc-randconfig-r004-20200831 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/nvme/host/pci.c: In function 'nvme_handle_cqe':
>> drivers/nvme/host/pci.c:3244: error: unterminated argument list invoking 
>> macro "dev_warn"
3244 | module_exit(nvme_exit);
 | 
>> drivers/nvme/host/pci.c:966:3: error: 'dev_warn' undeclared (first use in 
>> this function); did you mean '_dev_warn'?
 966 |   dev_warn(nvmeq->dev->ctrl.device,
 |   ^~~~
 |   _dev_warn
   drivers/nvme/host/pci.c:966:3: note: each undeclared identifier is reported 
only once for each function it appears in
>> drivers/nvme/host/pci.c:966:11: error: expected ';' at end of input
 966 |   dev_warn(nvmeq->dev->ctrl.device,
 |   ^
 |   ;
   ..
3244 | module_exit(nvme_exit);
 |
>> drivers/nvme/host/pci.c:966:3: error: expected declaration or 
>> statement at end of input
 966 |   dev_warn(nvmeq->dev->ctrl.device,
 |   ^~~~
>> drivers/nvme/host/pci.c:966:3: error: expected declaration or 
>> statement at end of input
   drivers/nvme/host/pci.c: At top level:
   drivers/nvme/host/pci.c:105:13: warning: 'nvme_dev_disable' declared 
'static' but never defined [-Wunused-function]
 105 | static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 | ^~~~
   drivers/nvme/host/pci.c:106:13: warning: '__nvme_disable_io_queues' declared 
'static' but never defined [-Wunused-function]
 106 | static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 
opcode);
 | ^~~~
   drivers/nvme/host/pci.c:901:13: warning: 'nvme_pci_complete_rq' defined but 
not used [-Wunused-function]
 901 | static void nvme_pci_complete_rq(struct request *req)
 | ^~~~
   drivers/nvme/host/pci.c:853:21: warning: 'nvme_queue_rq' defined but not 
used [-Wunused-function]
 853 | static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 | ^
   drivers/nvme/host/pci.c:484:13: warning: 'nvme_commit_rqs' defined but not 
used [-Wunused-function]
 484 | static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
 | ^~~
   drivers/nvme/host/pci.c:427:12: warning: 'nvme_pci_map_queues' defined but 
not used [-Wunused-function]
 427 | static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 |^~~
   drivers/nvme/host/pci.c:403:12: warning: 'nvme_init_request' defined but not 
used [-Wunused-function]
 403 | static int nvme_init_request(struct blk_mq_tag_set *set, struct 
request *req,
 |^
   drivers/nvme/host/pci.c:392:12: warning: 'nvme_init_hctx' defined but not 
used [-Wunused-function]
 392 | static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 |^~
   drivers/nvme/host/pci.c:379:12: warning: 'nvme_admin_init_hctx' defined but 
not used [-Wunused-function]
 379 | static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void 
*data,
 |^~~~
   drivers/nvme/host/pci.c:371:15: warning: 'nvme_pci_iod_alloc_size' defined 
but not used [-Wunused-function]
 371 | static size_t nvme_pci_iod_alloc_size(void)
 |  

RE: [PATCH] aio: make aio wait path to account iowait time

2020-08-28 Thread Tianxianting
Thanks peterz, jan
So, enable aio iowait time accounting is a bad idea:(
I gained a lot from you, thanks

-Original Message-
From: pet...@infradead.org [mailto:pet...@infradead.org] 
Sent: Friday, August 28, 2020 6:52 PM
To: Jan Kara 
Cc: tianxianting (RD) ; v...@zeniv.linux.org.uk; 
b...@kvack.org; mi...@redhat.com; juri.le...@redhat.com; 
vincent.guit...@linaro.org; dietmar.eggem...@arm.com; rost...@goodmis.org; 
bseg...@google.com; mgor...@suse.de; linux-fsde...@vger.kernel.org; 
linux-...@kvack.org; linux-kernel@vger.kernel.org; Tejun Heo ; 
han...@cmpxchg.org
Subject: Re: [PATCH] aio: make aio wait path to account iowait time

On Fri, Aug 28, 2020 at 11:41:29AM +0200, Jan Kara wrote:
> On Fri 28-08-20 11:07:29, pet...@infradead.org wrote:
> > On Fri, Aug 28, 2020 at 02:07:12PM +0800, Xianting Tian wrote:
> > > As the normal aio wait path(read_events() ->
> > > wait_event_interruptible_hrtimeout()) doesn't account iowait time, 
> > > so use this patch to make it to account iowait time, which can 
> > > truely reflect the system io situation when using a tool like 'top'.
> > 
> > Do be aware though that io_schedule() is potentially far more 
> > expensive than regular schedule() and io-wait accounting as a whole 
> > is a trainwreck.
> 
> Hum, I didn't know that io_schedule() is that much more expensive. 
> Thanks for info.

It's all relative, but it can add up under contention. And since these storage 
thingies are getting faster every year, I'm assuming these schedule rates are 
increasing along with it.

> > When in_iowait is set schedule() and ttwu() will have to do 
> > additional atomic ops, and (much) worse, PSI will take additional locks.
> > 
> > And all that for a number that, IMO, is mostly useless, see the 
> > comment with nr_iowait().
> 
> Well, I understand the limited usefulness of the system or even per 
> CPU percentage spent in IO wait. However whether a particular task is 
> sleeping waiting for IO or not

So strict per-task state is not a problem, and we could easily change
get_task_state() to distinguish between IO-wait or not, basically duplicate S/D 
state into an IO-wait variant of the same. Although even this has ABI 
implications :-(

> is IMO a useful diagnostic information and there are several places in 
> the kernel that take that into account (PSI, hangcheck timer, cpufreq, 
> ...).

So PSI is the one I hate most. We spend an aweful lot of time to not have to 
take the old rq->lock on wakeup, and PSI reintroduced it for accounting 
purposes -- I hate accounting overhead. :/

There's a number of high frequency scheduling workloads where it really adds 
up, which is the reason we got rid of it in the first place.

OTOH, PSI gives more sensible numbers, although it goes side-ways when you 
introduce affinity masks / cpusets.

The menu-cpufreq gov is known crazy and we're all hard working on replacing it.

And the tick-sched usage is, iirc, the nohz case of iowait.

> So I don't see that properly accounting that a task is waiting for IO 
> is just "expensive random number generator" as you mention below :). 
> But I'm open to being educated...

It's the userspace iowait, and in particular the per-cpu iowait numbers that I 
hate. Only on UP does any of that make sense.

But we can't remove them because ABI :-(



RE: [PATCH] [v2] blk-mq: use BLK_MQ_NO_TAG for no tag

2020-08-26 Thread Tianxianting
Hi Ming Lei
Thanks for your quick comment.
As the function request_to_qc_t() in 'include/linux/blk-mq.h ' used the magic 
'-1',
Seems it is hard to replace it with BLK_MQ_NO_TAG :(

-Original Message-
From: Ming Lei [mailto:ming@redhat.com] 
Sent: Wednesday, August 26, 2020 12:29 PM
To: tianxianting (RD) 
Cc: ax...@kernel.dk; linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [v2] blk-mq: use BLK_MQ_NO_TAG for no tag

On Wed, Aug 26, 2020 at 10:06:51AM +0800, Xianting Tian wrote:
> Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.
> And move the definition of BLK_MQ_NO_TAG from 'block/blk-mq-tag.h'
> to 'include/linux/blk-mq.h'

All three symbols are supposed for block core internal code only, so looks you 
shouldn't move them to public header.


Thanks,
Ming



RE: [PATCH] blk-mq: use BLK_MQ_NO_TAG for no tag

2020-08-24 Thread Tianxianting
Hi Jens
Sorry to trouble you, I am very sorry for taking it for granted for the patch 
compile.

I checked, compile for the two changed files are OK:
block/blk-core.c
block/blk-mq-sched.c

Compile failed for the function in below header file:
include/linux/blk-mq.h: request_to_qc_t()
-   if (rq->tag != -1)
+   if (rq->tag != BLK_MQ_NO_TAG)

BLK_MQ_NO_TAG is defined in 'block/blk-mq-tag.h',  if I include this header 
file in 'include/linux/blk-mq.h' via '#include "../../block/blk-mq-tag.h"', 
Many other similar compile failed happen, for example:
In file included from ./include/linux/../../block/blk-mq-tag.h:5:0,
 from ./include/linux/blk-mq.h:8,
 from block/blk.h:6,
 from block/bio.c:24:
./include/linux/../../block/blk-mq.h:21:29: error: ‘HCTX_MAX_TYPES’ undeclared 
here (not in a function); did you mean ‘KOBJ_NS_TYPES’?
   struct list_head rq_lists[HCTX_MAX_TYPES];
 ^~
If I moved below BLK_MQ_NO_TAG definition to 'include/linux/blk-mq.h', all 
kernel compiles are ok with the patch.
enum {
BLK_MQ_NO_TAG   = -1U,
BLK_MQ_TAG_MIN  = 1,
BLK_MQ_TAG_MAX  = BLK_MQ_NO_TAG - 1,
};
Will you accept above moving of BLK_MQ_NO_TAG to 'include/linux/blk-mq.h'?   
Thanks

-Original Message-
From: Jens Axboe [mailto:ax...@kernel.dk] 
Sent: Monday, August 24, 2020 4:58 AM
To: tianxianting (RD) ; a...@kernel.org; 
dan...@iogearbox.net; ka...@fb.com; songliubrav...@fb.com; y...@fb.com; 
andr...@fb.com; john.fastab...@gmail.com; kpsi...@chromium.org
Cc: linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] blk-mq: use BLK_MQ_NO_TAG for no tag

On 8/23/20 9:44 AM, Xianting Tian wrote:
> Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.

Doesn't look like this patch was even compiled...

-- 
Jens Axboe



答复: [PATCH] timers: use set_current_state macro

2020-05-14 Thread Tianxianting
Please review this one, thanks.
https://lkml.org/lkml/2020/5/14/796

-邮件原件-
发件人: tianxianting (RD)
发送时间: 2020年5月14日 9:46
收件人: 'Thomas Gleixner' ; john.stu...@linaro.org; 
sb...@kernel.org
抄送: linux-kernel@vger.kernel.org
主题: 答复: [PATCH] timers: use set_current_state macro

Hi Thomas
Thanks for your reply :)
I am sorry for inconveniencing you about this.
tian.xiant...@h3c.com is the mail which is provided by my company, My company 
may added some limit when I send the mail out of it, I will re-send 
the patch via my personal mail, I used it to send several patches before, and 
patches were accepted.

-邮件原件-
发件人: Thomas Gleixner [mailto:t...@linutronix.de]
发送时间: 2020年5月14日 5:22
收件人: tianxianting (RD) ; john.stu...@linaro.org; 
sb...@kernel.org
抄送: linux-kernel@vger.kernel.org; tianxianting (RD) 
主题: Re: [PATCH] timers: use set_current_state macro

Xianting,

Xianting Tian  writes:

thanks for your patch.

Can you please fix your mail client to have proper mail headers? It
provides:

  Content-Type: text/plain
  Content-Transfer-Encoding: quoted-printable

but it fails to provide the charset information. That causes the footer to 
become unreadable garbage not only in my mail reader. See:

  https://lore.kernel.org/lkml/20200508020222.15791-1-tian.xiant...@h3c.com/

What's worse is that is causes my patch handling scripts to decode the mail 
body correctly. And I'm not really inclined to figure out how to handle this 
case.

> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1885,7 +1885,7 @@ signed long __sched schedule_timeout(signed long 
> timeout)
> printk(KERN_ERR "schedule_timeout: wrong timeout "
> "value %lx\n", timeout);
> dump_stack();
> -   current->state = TASK_RUNNING;
> +   set_current_state(TASK_RUNNING);

That's not the same and adds a barrier which is not needed.

Not a big problem in that particular error handling code path, but in general 
you really have to look whether your replacement is resulting in the same code.

If not then you need to make an argument in the changelog why you are replacing 
existing code with something which is not fully equivalent.

For this particular case, please check the implementation and read the 
documentation of set_current_state() in include/linux/sched.h.

> --
> ---
> ±¾Óʼþ¼°Æ丽¼þº¬ÓÐлªÈý¼¯Íŵı£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢Ë͸øÉÏÃæµØÖ·ÖÐÁгö
> µÄ¸öÈË»òȺ×é¡£½ûÖ¹ÈκÎÆäËûÈËÒÔÈκÎÐÎʽʹÓ㨰üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØй
> ¶¡¢¸´ÖÆ¡¢
> »òÉ¢·¢£©±¾ÓʼþÖеÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇëÄúÁ¢¼´µç»°»òÓʼþ֪ͨ·¢¼þ
> È˲¢É¾³ý±¾
> Óʼþ£¡

This is the resulting garbage. Not that I could decipher the chinese characters 
which should be here instead, but at least they would look way nicer. But see 
below:

> This e-mail and its attachments contain confidential information from
> New H3C, which is intended only for the person or entity whose address
> is listed above. Any use of the information contained herein in any
> way (including, but not limited to, total or partial disclosure,
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error,
> please notify the sender by phone or email immediately and delete it!

Can you please remove this disclaimer completely (which avoids the garbage 
issue as well) ?

It does not make any sense if you send mail to a public mailing list:

 1) If you send mail to a public list which is archived in public then
the information can't be confidential and restricted to a particular
audience. It can be accessed by everyone on this planet who has
access to the internet.

 2) If you really send confidental information accidentally then there
is no way to delete it. It's out there in the public and in archives
and you can't call it back.

Thanks,

tglx


-
本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!


答复: [PATCH] timers: use set_current_state macro

2020-05-13 Thread Tianxianting
Hi Thomas
Thanks for your reply :)
I am sorry for inconveniencing you about this.
tian.xiant...@h3c.com is the mail which is provided by my company,
My company may added some limit when I send the mail out of it,
I will re-send the patch via my personal mail, I used it to send several 
patches before, and patches were accepted.

-邮件原件-
发件人: Thomas Gleixner [mailto:t...@linutronix.de]
发送时间: 2020年5月14日 5:22
收件人: tianxianting (RD) ; john.stu...@linaro.org; 
sb...@kernel.org
抄送: linux-kernel@vger.kernel.org; tianxianting (RD) 
主题: Re: [PATCH] timers: use set_current_state macro

Xianting,

Xianting Tian  writes:

thanks for your patch.

Can you please fix your mail client to have proper mail headers? It
provides:

  Content-Type: text/plain
  Content-Transfer-Encoding: quoted-printable

but it fails to provide the charset information. That causes the footer to 
become unreadable garbage not only in my mail reader. See:

  https://lore.kernel.org/lkml/20200508020222.15791-1-tian.xiant...@h3c.com/

What's worse is that is causes my patch handling scripts to decode the mail 
body correctly. And I'm not really inclined to figure out how to handle this 
case.

> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1885,7 +1885,7 @@ signed long __sched schedule_timeout(signed long 
> timeout)
> printk(KERN_ERR "schedule_timeout: wrong timeout "
> "value %lx\n", timeout);
> dump_stack();
> -   current->state = TASK_RUNNING;
> +   set_current_state(TASK_RUNNING);

That's not the same and adds a barrier which is not needed.

Not a big problem in that particular error handling code path, but in general 
you really have to look whether your replacement is resulting in the same code.

If not then you need to make an argument in the changelog why you are replacing 
existing code with something which is not fully equivalent.

For this particular case, please check the implementation and read the 
documentation of set_current_state() in include/linux/sched.h.

> --
> ---
> ±¾Óʼþ¼°Æ丽¼þº¬ÓÐлªÈý¼¯Íŵı£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢Ë͸øÉÏÃæµØÖ·ÖÐÁгö
> µÄ¸öÈË»òȺ×é¡£½ûÖ¹ÈκÎÆäËûÈËÒÔÈκÎÐÎʽʹÓ㨰üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØй
> ¶¡¢¸´ÖÆ¡¢
> »òÉ¢·¢£©±¾ÓʼþÖеÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇëÄúÁ¢¼´µç»°»òÓʼþ֪ͨ·¢¼þ
> È˲¢É¾³ý±¾
> Óʼþ£¡

This is the resulting garbage. Not that I could decipher the chinese characters 
which should be here instead, but at least they would look way nicer. But see 
below:

> This e-mail and its attachments contain confidential information from
> New H3C, which is intended only for the person or entity whose address
> is listed above. Any use of the information contained herein in any
> way (including, but not limited to, total or partial disclosure,
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error,
> please notify the sender by phone or email immediately and delete it!

Can you please remove this disclaimer completely (which avoids the garbage 
issue as well) ?

It does not make any sense if you send mail to a public mailing list:

 1) If you send mail to a public list which is archived in public then
the information can't be confidential and restricted to a particular
audience. It can be accessed by everyone on this planet who has
access to the internet.

 2) If you really send confidental information accidentally then there
is no way to delete it. It's out there in the public and in archives
and you can't call it back.

Thanks,

tglx


-
本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from New H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!