date:20210316

[PATCH v6 02/14] media: docs: Add some RGB bus formats for i.MX8qm/qxp pixel combiner

2021-03-16 Thread Liu Ying

This patch adds documentations for RGB666_1X30_CPADLO, RGB888_1X30_CPADLO,
RGB666_1X36_CPADLO and RGB888_1X36_CPADLO bus formats used by i.MX8qm/qxp
pixel combiner.  The RGB pixels with padding low per component are
transmitted on a 30-bit input bus(10-bit per component) from a display
controller or a 36-bit output bus(12-bit per component) to a pixel link.

Reviewed-by: Robert Foss 
Reviewed-by: Laurent Pinchart 
Signed-off-by: Liu Ying 
---
Robert, I keep your R-b tag from v5. Let me know if you want me to drop it, as
v6 contains a fix.

v5->v6:
* Fix data organizations of MEDIA_BUS_FMT_RGB{666,888}_1X30-CPADLO. (Laurent)
* Add Laurent's R-b tag.

v4->v5:
* Add Robert's R-b tag.

v3->v4:
* No change.

v2->v3:
* No change.

v1->v2:
* No change.

 .../userspace-api/media/v4l/subdev-formats.rst | 156 +
 1 file changed, 156 insertions(+)

diff --git a/Documentation/userspace-api/media/v4l/subdev-formats.rst 
b/Documentation/userspace-api/media/v4l/subdev-formats.rst
index 7f16cbe..1402e18 100644
--- a/Documentation/userspace-api/media/v4l/subdev-formats.rst
+++ b/Documentation/userspace-api/media/v4l/subdev-formats.rst
@@ -1488,6 +1488,80 @@ The following tables list existing packed RGB formats.
   - b\ :sub:`2`
   - b\ :sub:`1`
   - b\ :sub:`0`
+* .. _MEDIA-BUS-FMT-RGB666-1X30-CPADLO:
+
+  - MEDIA_BUS_FMT_RGB666_1X30-CPADLO
+  - 0x101e
+  -
+  -
+  -
+  - r\ :sub:`5`
+  - r\ :sub:`4`
+  - r\ :sub:`3`
+  - r\ :sub:`2`
+  - r\ :sub:`1`
+  - r\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+  - g\ :sub:`5`
+  - g\ :sub:`4`
+  - g\ :sub:`3`
+  - g\ :sub:`2`
+  - g\ :sub:`1`
+  - g\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+  - b\ :sub:`5`
+  - b\ :sub:`4`
+  - b\ :sub:`3`
+  - b\ :sub:`2`
+  - b\ :sub:`1`
+  - b\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+* .. _MEDIA-BUS-FMT-RGB888-1X30-CPADLO:
+
+  - MEDIA_BUS_FMT_RGB888_1X30-CPADLO
+  - 0x101f
+  -
+  -
+  -
+  - r\ :sub:`7`
+  - r\ :sub:`6`
+  - r\ :sub:`5`
+  - r\ :sub:`4`
+  - r\ :sub:`3`
+  - r\ :sub:`2`
+  - r\ :sub:`1`
+  - r\ :sub:`0`
+  - 0
+  - 0
+  - g\ :sub:`7`
+  - g\ :sub:`6`
+  - g\ :sub:`5`
+  - g\ :sub:`4`
+  - g\ :sub:`3`
+  - g\ :sub:`2`
+  - g\ :sub:`1`
+  - g\ :sub:`0`
+  - 0
+  - 0
+  - b\ :sub:`7`
+  - b\ :sub:`6`
+  - b\ :sub:`5`
+  - b\ :sub:`4`
+  - b\ :sub:`3`
+  - b\ :sub:`2`
+  - b\ :sub:`1`
+  - b\ :sub:`0`
+  - 0
+  - 0
 * .. _MEDIA-BUS-FMT-ARGB888-1X32:
 
   - MEDIA_BUS_FMT_ARGB888_1X32
@@ -1665,6 +1739,88 @@ The following table list existing packed 36bit wide RGB 
formats.
   - 2
   - 1
   - 0
+* .. _MEDIA-BUS-FMT-RGB666-1X36-CPADLO:
+
+  - MEDIA_BUS_FMT_RGB666_1X36_CPADLO
+  - 0x1020
+  -
+  - r\ :sub:`5`
+  - r\ :sub:`4`
+  - r\ :sub:`3`
+  - r\ :sub:`2`
+  - r\ :sub:`1`
+  - r\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - g\ :sub:`5`
+  - g\ :sub:`4`
+  - g\ :sub:`3`
+  - g\ :sub:`2`
+  - g\ :sub:`1`
+  - g\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - b\ :sub:`5`
+  - b\ :sub:`4`
+  - b\ :sub:`3`
+  - b\ :sub:`2`
+  - b\ :sub:`1`
+  - b\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+  - 0
+* .. _MEDIA-BUS-FMT-RGB888-1X36-CPADLO:
+
+  - MEDIA_BUS_FMT_RGB888_1X36_CPADLO
+  - 0x1021
+  -
+  - r\ :sub:`7`
+  - r\ :sub:`6`
+  - r\ :sub:`5`
+  - r\ :sub:`4`
+  - r\ :sub:`3`
+  - r\ :sub:`2`
+  - r\ :sub:`1`
+  - r\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+  - g\ :sub:`7`
+  - g\ :sub:`6`
+  - g\ :sub:`5`
+  - g\ :sub:`4`
+  - g\ :sub:`3`
+  - g\ :sub:`2`
+  - g\ :sub:`1`
+  - g\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
+  - b\ :sub:`7`
+  - b\ :sub:`6`
+  - b\ :sub:`5`
+  - b\ :sub:`4`
+  - b\ :sub:`3`
+  - b\ :sub:`2`
+  - b\ :sub:`1`
+  - b\ :sub:`0`
+  - 0
+  - 0
+  - 0
+  - 0
 * .. _MEDIA-BUS-FMT-RGB121212-1X36:
 
   - MEDIA_BUS_FMT_RGB121212_1X36
-- 
2.7.4

[PATCH 1/2] erofs: use workqueue decompression for atomic contexts only

2021-03-16 Thread Huang Jianan

z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
Reviewed-by: Gao Xiang 
---
 fs/erofs/zdata.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 6cb356c4217b..cf2d28582c14 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -706,6 +706,7 @@ static int z_erofs_do_read_page(struct 
z_erofs_decompress_frontend *fe,
goto out;
 }
 
+static void z_erofs_decompressqueue_work(struct work_struct *work);
 static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io,
   bool sync, int bios)
 {
@@ -720,8 +721,14 @@ static void z_erofs_decompress_kickoff(struct 
z_erofs_decompressqueue *io,
return;
}
 
-   if (!atomic_add_return(bios, >pending_bios))
+   if (atomic_add_return(bios, >pending_bios))
+   return;
+   /* Use workqueue decompression for atomic contexts only */
+   if (in_atomic() || irqs_disabled()) {
queue_work(z_erofs_workqueue, >u.work);
+   return;
+   }
+   z_erofs_decompressqueue_work(>u.work);
 }
 
 static bool z_erofs_page_is_invalidated(struct page *page)
-- 
2.25.1

[PATCH 2/2] erofs: use sync decompression for atomic contexts only

2021-03-16 Thread Huang Jianan

Sync decompression was introduced to get rid of additional kworker
scheduling overhead. But there is no such overhead in non-atomic
contexts. Therefore, it should be better to turn off sync decompression
to avoid the current thread waiting in z_erofs_runqueue.

Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
Reviewed-by: Gao Xiang 
---
 fs/erofs/internal.h | 2 ++
 fs/erofs/super.c| 1 +
 fs/erofs/zdata.c| 8 ++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 67a7ec945686..fbc4040715be 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -50,6 +50,8 @@ struct erofs_fs_context {
 #ifdef CONFIG_EROFS_FS_ZIP
/* current strategy of how to use managed cache */
unsigned char cache_strategy;
+   /* strategy of sync decompression (false - auto, true - force on) */
+   bool readahead_sync_decompress;
 
/* threshold for decompression synchronously */
unsigned int max_sync_decompress_pages;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index d5a6b9b888a5..0445d09b6331 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -200,6 +200,7 @@ static void erofs_default_options(struct erofs_fs_context 
*ctx)
 #ifdef CONFIG_EROFS_FS_ZIP
ctx->cache_strategy = EROFS_ZIP_CACHE_READAROUND;
ctx->max_sync_decompress_pages = 3;
+   ctx->readahead_sync_decompress = false;
 #endif
 #ifdef CONFIG_EROFS_FS_XATTR
set_opt(ctx, XATTR_USER);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index cf2d28582c14..25a0c4890d0a 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -710,6 +710,8 @@ static void z_erofs_decompressqueue_work(struct work_struct 
*work);
 static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io,
   bool sync, int bios)
 {
+   struct erofs_sb_info *const sbi = EROFS_SB(io->sb);
+
/* wake up the caller thread for sync decompression */
if (sync) {
unsigned long flags;
@@ -723,9 +725,10 @@ static void z_erofs_decompress_kickoff(struct 
z_erofs_decompressqueue *io,
 
if (atomic_add_return(bios, >pending_bios))
return;
-   /* Use workqueue decompression for atomic contexts only */
+   /* Use workqueue and sync decompression for atomic contexts only */
if (in_atomic() || irqs_disabled()) {
queue_work(z_erofs_workqueue, >u.work);
+   sbi->ctx.readahead_sync_decompress = true;
return;
}
z_erofs_decompressqueue_work(>u.work);
@@ -1340,7 +1343,8 @@ static void z_erofs_readahead(struct readahead_control 
*rac)
struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
 
unsigned int nr_pages = readahead_count(rac);
-   bool sync = (nr_pages <= sbi->ctx.max_sync_decompress_pages);
+   bool sync = (sbi->ctx.readahead_sync_decompress &&
+   nr_pages <= sbi->ctx.max_sync_decompress_pages);
struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
struct page *page, *head = NULL;
LIST_HEAD(pagepool);
-- 
2.25.1

[PATCH 0/2] erofs: decompress in endio if possible

2021-03-16 Thread Huang Jianan

This patch set was separated form erofs: decompress in endio if possible
since it does these things:
- combine dm-verity and erofs workqueue
- change policy of decompression in context of thread 

Huang Jianan (2):
  erofs: use workqueue decompression for atomic contexts only
  erofs: use sync decompression for atomic contexts only

 fs/erofs/internal.h |  2 ++
 fs/erofs/super.c|  1 +
 fs/erofs/zdata.c| 15 +--
 3 files changed, 16 insertions(+), 2 deletions(-)

-- 
2.25.1

Re: [PATCH v6 2/2] erofs: decompress in endio if possible

2021-03-16 Thread Huang Jianan




On 2021/3/16 16:26, Chao Yu wrote:

Hi Jianan,

On 2021/3/16 11:15, Huang Jianan via Linux-erofs wrote:

z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead. Also, it makes no sense to apply synchronous
decompression for such case.


It looks this patch does more than one things:
- combine dm-verity and erofs workqueue
- change policy of decompression in context of thread

Normally, we do one thing in one patch, by this way, we will be 
benefit in

scenario of when backporting patches and bisecting problematic patch with
minimum granularity, and also it will help reviewer to focus on reviewing
single code logic by following patch's goal.

So IMO, it would be better to separate this patch into two.


Thanks for the suggestion, I will send a new patch set.
One more thing is could you explain a little bit more about why we 
need to
change policy of decompression in context of thread? for better 
performance?


Sync decompression was introduced to get rid of additional kworker 
scheduling


overhead. But there is no such overhead in if we try to decompress 
directly in


z_erofs_decompressqueue_endio . Therefore, it  should be better to turn off

sync decompression to avoid the current thread waiting in z_erofs_runqueue.


BTW, code looks clean to me. :)

Thanks,



Signed-off-by: Huang Jianan 
Signed-off-by: Guo Weichao 
Reviewed-by: Gao Xiang 
---
  fs/erofs/internal.h |  2 ++
  fs/erofs/super.c    |  1 +
  fs/erofs/zdata.c    | 15 +--
  3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 67a7ec945686..fbc4040715be 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -50,6 +50,8 @@ struct erofs_fs_context {
  #ifdef CONFIG_EROFS_FS_ZIP
  /* current strategy of how to use managed cache */
  unsigned char cache_strategy;
+    /* strategy of sync decompression (false - auto, true - force 
on) */

+    bool readahead_sync_decompress;
    /* threshold for decompression synchronously */
  unsigned int max_sync_decompress_pages;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index d5a6b9b888a5..0445d09b6331 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -200,6 +200,7 @@ static void erofs_default_options(struct 
erofs_fs_context *ctx)

  #ifdef CONFIG_EROFS_FS_ZIP
  ctx->cache_strategy = EROFS_ZIP_CACHE_READAROUND;
  ctx->max_sync_decompress_pages = 3;
+    ctx->readahead_sync_decompress = false;
  #endif
  #ifdef CONFIG_EROFS_FS_XATTR
  set_opt(ctx, XATTR_USER);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 6cb356c4217b..25a0c4890d0a 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -706,9 +706,12 @@ static int z_erofs_do_read_page(struct 
z_erofs_decompress_frontend *fe,

  goto out;
  }
  +static void z_erofs_decompressqueue_work(struct work_struct *work);
  static void z_erofs_decompress_kickoff(struct 
z_erofs_decompressqueue *io,

 bool sync, int bios)
  {
+    struct erofs_sb_info *const sbi = EROFS_SB(io->sb);
+
  /* wake up the caller thread for sync decompression */
  if (sync) {
  unsigned long flags;
@@ -720,8 +723,15 @@ static void z_erofs_decompress_kickoff(struct 
z_erofs_decompressqueue *io,

  return;
  }
  -    if (!atomic_add_return(bios, >pending_bios))
+    if (atomic_add_return(bios, >pending_bios))
+    return;
+    /* Use workqueue and sync decompression for atomic contexts only */
+    if (in_atomic() || irqs_disabled()) {
  queue_work(z_erofs_workqueue, >u.work);
+    sbi->ctx.readahead_sync_decompress = true;
+    return;
+    }
+    z_erofs_decompressqueue_work(>u.work);
  }
    static bool z_erofs_page_is_invalidated(struct page *page)
@@ -1333,7 +1343,8 @@ static void z_erofs_readahead(struct 
readahead_control *rac)

  struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
    unsigned int nr_pages = readahead_count(rac);
-    bool sync = (nr_pages <= sbi->ctx.max_sync_decompress_pages);
+    bool sync = (sbi->ctx.readahead_sync_decompress &&
+    nr_pages <= sbi->ctx.max_sync_decompress_pages);
  struct z_erofs_decompress_frontend f = 
DECOMPRESS_FRONTEND_INIT(inode);

  struct page *page, *head = NULL;
  LIST_HEAD(pagepool);

Re: [PATCH] staging: rtl8723bs/core: add spaces between operators

2021-03-16 Thread Joe Perches

On Tue, 2021-03-16 at 20:05 +0800, Qiang Ma wrote:
> Add spaces between operators for a better readability
> in function 'rtw_seccalctkipmic'.

Perhaps better would be to refactor it a bit to
follow the comments.  Something like:
---
 drivers/staging/rtl8723bs/core/rtw_security.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_security.c 
b/drivers/staging/rtl8723bs/core/rtw_security.c
index a311595deafb..a30e1fa717af 100644
--- a/drivers/staging/rtl8723bs/core/rtw_security.c
+++ b/drivers/staging/rtl8723bs/core/rtw_security.c
@@ -405,30 +405,26 @@ void rtw_secgetmic(struct mic_data *pmicdata, u8 *dst)
 
 void rtw_seccalctkipmic(u8 *key, u8 *header, u8 *data, u32 data_len, u8 
*mic_code, u8 pri)
 {
-
struct mic_data micdata;
u8 priority[4] = {0x0, 0x0, 0x0, 0x0};
+   int da_offset;
+   int sa_offset;
 
rtw_secmicsetkey(, key);
priority[0] = pri;
 
/* Michael MIC pseudo header: DA, SA, 3 x 0, Priority */
-   if (header[1]&1) {   /* ToDS == 1 */
-   rtw_secmicappend(, [16], 6);  /* DA */
-   if (header[1]&2)  /* From Ds == 1 */
-   rtw_secmicappend(, [24], 6);
-   else
-   rtw_secmicappend(, [10], 6);
-   } else {/* ToDS == 0 */
-   rtw_secmicappend(, [4], 6);   /* DA */
-   if (header[1]&2)  /* From Ds == 1 */
-   rtw_secmicappend(, [16], 6);
-   else
-   rtw_secmicappend(, [10], 6);
+   if (header[1] & 1) {   /* ToDS == 1 */
+   da_offset = 16;
+   sa_offset = (header[1] & 2) ? 24 : 10;
+   } else {/* ToDS == 0 */
+   da_offset = 4;
+   sa_offset = (header[1] & 2) ? 16 : 10;
}
+   rtw_secmicappend(, [da_offset], 6);  /* DA */
+   rtw_secmicappend(, [sa_offset], 6);  /* SA */
rtw_secmicappend(, [0], 4);
 
-
rtw_secmicappend(, data, data_len);
 
rtw_secgetmic(, mic_code);

RE: [PATCH v3 05/11] mm, fsdax: Refactor memory-failure handler for dax mapping

2021-03-16 Thread ruansy.f...@fujitsu.com



> -Original Message-
> From: zhong jiang 
> Subject: Re: [PATCH v3 05/11] mm, fsdax: Refactor memory-failure handler for
> dax mapping
> 
> > +int mf_dax_mapping_kill_procs(struct address_space *mapping, pgoff_t
> > +index, int flags) {
> > +   const bool unmap_success = true;
> > +   unsigned long pfn, size = 0;
> > +   struct to_kill *tk;
> > +   LIST_HEAD(to_kill);
> > +   int rc = -EBUSY;
> > +   loff_t start;
> > +
> > +   /* load the pfn of the dax mapping file */
> > +   pfn = dax_load_pfn(mapping, index);
> > +   if (!pfn)
> > +   return rc;
> > +   /*
> > +* Unlike System-RAM there is no possibility to swap in a
> > +* different physical page at a given virtual address, so all
> > +* userspace consumption of ZONE_DEVICE memory necessitates
> > +* SIGBUS (i.e. MF_MUST_KILL)
> > +*/
> > +   flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
> 
> MF_ACTION_REQUIRED only kill the current execution context. A page can be
> shared when reflink file be mapped by different process. We can not kill all
> process shared the page.  Other process still can access the posioned page ?

AFAIK, the other processes will receive a SIGBUS when accessing this corrupted 
range.  But I didn't add a testcase for this condition.  I'll test it.  Thanks 
for pointing out.


--
Thanks,
Ruan Shiyang.

> 
> Thanks,
> zhong jiang
> 
> > +   collect_procs_file(pfn_to_page(pfn), mapping, index, _kill,
> > +  flags & MF_ACTION_REQUIRED);
> > +
> > +   list_for_each_entry(tk, _kill, nd)
> > +   if (tk->size_shift)
> > +   size = max(size, 1UL << tk->size_shift);
> > +   if (size) {
> > +   /*
> > +* Unmap the largest mapping to avoid breaking up
> > +* device-dax mappings which are constant size. The
> > +* actual size of the mapping being torn down is
> > +* communicated in siginfo, see kill_proc()
> > +*/
> > +   start = (index << PAGE_SHIFT) & ~(size - 1);
> > +   unmap_mapping_range(mapping, start, start + size, 0);
> > +   }
> > +
> > +   kill_procs(_kill, flags & MF_MUST_KILL, !unmap_success,
> > +  pfn, flags);
> > +   rc = 0;
> > +   return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(mf_dax_mapping_kill_procs);
> > +
> >   static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >   {
> > struct page *p = pfn_to_page(pfn);
> > @@ -1297,7 +1346,7 @@ static int memory_failure_dev_pagemap(unsigned
> long pfn, int flags,
> > const bool unmap_success = true;
> > unsigned long size = 0;
> > struct to_kill *tk;
> > -   LIST_HEAD(tokill);
> > +   LIST_HEAD(to_kill);
> > int rc = -EBUSY;
> > loff_t start;
> > dax_entry_t cookie;
> > @@ -1345,9 +1394,10 @@ static int
> memory_failure_dev_pagemap(unsigned long pfn, int flags,
> >  * SIGBUS (i.e. MF_MUST_KILL)
> >  */
> > flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
> > -   collect_procs(page, , flags & MF_ACTION_REQUIRED);
> > +   collect_procs_file(page, page->mapping, page->index, _kill,
> > +  flags & MF_ACTION_REQUIRED);
> >
> > -   list_for_each_entry(tk, , nd)
> > +   list_for_each_entry(tk, _kill, nd)
> > if (tk->size_shift)
> > size = max(size, 1UL << tk->size_shift);
> > if (size) {
> > @@ -1360,7 +1410,7 @@ static int memory_failure_dev_pagemap(unsigned
> long pfn, int flags,
> > start = (page->index << PAGE_SHIFT) & ~(size - 1);
> > unmap_mapping_range(page->mapping, start, start + size, 0);
> > }
> > -   kill_procs(, flags & MF_MUST_KILL, !unmap_success, pfn, flags);
> > +   kill_procs(_kill, flags & MF_MUST_KILL, !unmap_success, pfn,
> > +flags);
> > rc = 0;
> >   unlock:
> > dax_unlock_page(page, cookie);
>

[PATCH v4 13/13] mem/mempolicy: unify mpol_new_preferred() and mpol_new_preferred_many()

2021-03-16 Thread Feng Tang

To reduce some code duplication.

Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 18aa7dc..ee99ecc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -201,32 +201,21 @@ static int mpol_new_interleave(struct mempolicy *pol, 
const nodemask_t *nodes)
return 0;
 }
 
-static int mpol_new_preferred_many(struct mempolicy *pol,
+/* cover both MPOL_PREFERRED and MPOL_PREFERRED_MANY */
+static int mpol_new_preferred(struct mempolicy *pol,
   const nodemask_t *nodes)
 {
if (!nodes)
pol->flags |= MPOL_F_LOCAL; /* local allocation */
else if (nodes_empty(*nodes))
return -EINVAL; /*  no allowed nodes */
-   else
-   pol->nodes = *nodes;
-   return 0;
-}
-
-static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes)
-{
-   if (nodes) {
+   else {
/* MPOL_PREFERRED can only take a single node: */
-   nodemask_t tmp;
+   nodemask_t tmp = nodemask_of_node(first_node(*nodes));
 
-   if (nodes_empty(*nodes))
-   return -EINVAL;
-
-   tmp = nodemask_of_node(first_node(*nodes));
-   return mpol_new_preferred_many(pol, );
+   pol->nodes = (pol->mode == MPOL_PREFERRED) ? tmp : *nodes;
}
-
-   return mpol_new_preferred_many(pol, NULL);
+   return 0;
 }
 
 static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes)
@@ -468,7 +457,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] 
= {
},
/* [MPOL_LOCAL] - see mpol_new() */
[MPOL_PREFERRED_MANY] = {
-   .create = mpol_new_preferred_many,
+   .create = mpol_new_preferred,
.rebind = mpol_rebind_preferred_many,
},
 };
-- 
2.7.4

[PATCH v4 12/13] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY.

MPOL_PREFERRED_MANY will be adequately documented in the internal
admin-guide with this patch. Eventually, the man pages for mbind(2),
get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text
about this mode. Those shall contain the canonical reference.

NUMA systems continue to become more prevalent. New technologies like
PMEM make finer grain control over memory access patterns increasingly
desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of
nodes that will be tried first when performing allocations. If those
allocations fail, all remaining nodes will be tried. It's a straight
forward API which solves many of the presumptive needs of system
administrators wanting to optimize workloads on such machines. The mode
will work either per VMA, or per thread.

Generally speaking, this is similar to the way MPOL_BIND works, except
the user will only get a SIGSEGV if all nodes in the system are unable
to satisfy the allocation request.

Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widaw...@intel.com
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 Documentation/admin-guide/mm/numa_memory_policy.rst | 16 
 include/uapi/linux/mempolicy.h  |  6 +++---
 mm/hugetlb.c|  4 ++--
 mm/mempolicy.c  | 14 ++
 4 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst 
b/Documentation/admin-guide/mm/numa_memory_policy.rst
index 1ad020c..fcdaf97 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -245,6 +245,14 @@ MPOL_INTERLEAVED
address range or file.  During system boot up, the temporary
interleaved system default policy works in this mode.
 
+MPOL_PREFERRED_MANY
+This mode specifies that the allocation should be attempted from the
+nodemask specified in the policy. If that allocation fails, the kernel
+will search other nodes, in order of increasing distance from the first
+set bit in the nodemask based on information provided by the platform
+firmware. It is similar to MPOL_PREFERRED with the main exception that
+is an error to have an empty nodemask.
+
 NUMA memory policy supports the following optional mode flags:
 
 MPOL_F_STATIC_NODES
@@ -253,10 +261,10 @@ MPOL_F_STATIC_NODES
nodes changes after the memory policy has been defined.
 
Without this flag, any time a mempolicy is rebound because of a
-   change in the set of allowed nodes, the node (Preferred) or
-   nodemask (Bind, Interleave) is remapped to the new set of
-   allowed nodes.  This may result in nodes being used that were
-   previously undesired.
+change in the set of allowed nodes, the preferred nodemask (Preferred
+Many), preferred node (Preferred) or nodemask (Bind, Interleave) is
+remapped to the new set of allowed nodes.  This may result in nodes
+being used that were previously undesired.
 
With this flag, if the user-specified nodes overlap with the
nodes allowed by the task's cpuset, then the memory policy is
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 8948467..31e 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -16,13 +16,13 @@
  */
 
 /* Policies */
-enum {
-   MPOL_DEFAULT,
+enum { MPOL_DEFAULT,
MPOL_PREFERRED,
MPOL_BIND,
MPOL_INTERLEAVE,
MPOL_LOCAL,
-   MPOL_MAX,   /* always last member of enum */
+   MPOL_PREFERRED_MANY,
+   MPOL_MAX, /* always last member of enum */
 };
 
 /* Flags for set_mempolicy */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9dfbfa3..03ec958 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1126,7 +1126,7 @@ static struct page *dequeue_huge_page_vma(struct hstate 
*h,
 
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, , );
-   if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */
+   if (mpol->mode == MPOL_PREFERRED_MANY) {
gfp_t gfp_mask1 = gfp_mask | __GFP_NOWARN;
 
gfp_mask1 &= ~__GFP_DIRECT_RECLAIM;
@@ -1893,7 +1893,7 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
nodemask_t *nodemask;
 
nid = huge_node(vma, addr, gfp_mask, , );
-   if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */
+   if (mpol->mode == MPOL_PREFERRED_MANY) {
gfp_t gfp_mask1 = gfp_mask | __GFP_NOWARN;
 
gfp_mask1 &= ~__GFP_DIRECT_RECLAIM;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 40d32cb..18aa7dc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -108,8 +108,6 @@
 
 #include

[PATCH v4 11/13] mm/mempolicy: huge-page allocation for many preferred

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

Implement the missing huge page allocation functionality while obeying
the preferred node semantics.

This uses a fallback mechanism to try multiple preferred nodes first,
and then all other nodes. It cannot use the helper function that was
introduced because huge page allocation already has its own helpers and
it was more LOC, and effort to try to consolidate that.

The weirdness is MPOL_PREFERRED_MANY can't be called yet because it is
part of the UAPI we haven't yet exposed. Instead of make that define
global, it's simply changed with the UAPI patch.

[ feng: add NOWARN flag, and skip the direct reclaim to speedup allocation
  in some case ]

Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widaw...@intel.com
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/hugetlb.c   | 26 +++---
 mm/mempolicy.c |  3 ++-
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8fb42c6..9dfbfa3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1105,7 +1105,7 @@ static struct page *dequeue_huge_page_vma(struct hstate 
*h,
unsigned long address, int avoid_reserve,
long chg)
 {
-   struct page *page;
+   struct page *page = NULL;
struct mempolicy *mpol;
gfp_t gfp_mask;
nodemask_t *nodemask;
@@ -1126,7 +1126,17 @@ static struct page *dequeue_huge_page_vma(struct hstate 
*h,
 
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, , );
-   page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+   if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */
+   gfp_t gfp_mask1 = gfp_mask | __GFP_NOWARN;
+
+   gfp_mask1 &= ~__GFP_DIRECT_RECLAIM;
+   page = dequeue_huge_page_nodemask(h,
+   gfp_mask1, nid, nodemask);
+   if (!page)
+   page = dequeue_huge_page_nodemask(h, gfp_mask, nid, 
NULL);
+   } else {
+   page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+   }
if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
SetHPageRestoreReserve(page);
h->resv_huge_pages--;
@@ -1883,7 +1893,17 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
nodemask_t *nodemask;
 
nid = huge_node(vma, addr, gfp_mask, , );
-   page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask);
+   if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */
+   gfp_t gfp_mask1 = gfp_mask | __GFP_NOWARN;
+
+   gfp_mask1 &= ~__GFP_DIRECT_RECLAIM;
+   page = alloc_surplus_huge_page(h,
+   gfp_mask1, nid, nodemask);
+   if (!page)
+   alloc_surplus_huge_page(h, gfp_mask, nid, NULL);
+   } else {
+   page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask);
+   }
mpol_cond_put(mpol);
 
return page;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8fe76a7..40d32cb 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2085,7 +2085,8 @@ int huge_node(struct vm_area_struct *vma, unsigned long 
addr, gfp_t gfp_flags,
huge_page_shift(hstate_vma(vma)));
} else {
nid = policy_node(gfp_flags, *mpol, numa_node_id());
-   if ((*mpol)->mode == MPOL_BIND)
+   if ((*mpol)->mode == MPOL_BIND ||
+   (*mpol)->mode == MPOL_PREFERRED_MANY)
*nodemask = &(*mpol)->nodes;
}
return nid;
-- 
2.7.4

[PATCH v4 10/13] mm/mempolicy: VMA allocation for many preferred

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

This patch implements MPOL_PREFERRED_MANY for alloc_pages_vma(). Like
alloc_pages_current(), alloc_pages_vma() needs to support policy based
decisions if they've been configured via mbind(2).

The temporary "hack" of treating MPOL_PREFERRED and MPOL_PREFERRED_MANY
can now be removed with this, too.

All the actual machinery to make this work was part of
("mm/mempolicy: Create a page allocator for policy")

Link: https://lore.kernel.org/r/20200630212517.308045-11-ben.widaw...@intel.com
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a92efe7..8fe76a7 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2273,8 +2273,6 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
vm_area_struct *vma,
 {
struct mempolicy *pol;
struct page *page;
-   int preferred_nid;
-   nodemask_t *nmask;
 
pol = get_vma_policy(vma, addr);
 
@@ -2288,6 +2286,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
vm_area_struct *vma,
}
 
if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) {
+   nodemask_t *nmask;
int hpage_node = node;
 
/*
@@ -2301,10 +2300,26 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
vm_area_struct *vma,
 * does not allow the current node in its nodemask, we allocate
 * the standard way.
 */
-   if ((pol->mode == MPOL_PREFERRED ||
-pol->mode == MPOL_PREFERRED_MANY) &&
-   !(pol->flags & MPOL_F_LOCAL))
+   if (pol->mode == MPOL_PREFERRED || !(pol->flags & 
MPOL_F_LOCAL)) {
hpage_node = first_node(pol->nodes);
+   } else if (pol->mode == MPOL_PREFERRED_MANY) {
+   struct zoneref *z;
+
+   /*
+* In this policy, with direct reclaim, the normal
+* policy based allocation will do the right thing - try
+* twice using the preferred nodes first, and all nodes
+* second.
+*/
+   if (gfp & __GFP_DIRECT_RECLAIM) {
+   page = alloc_pages_policy(pol, gfp, order, 
NUMA_NO_NODE);
+   goto out;
+   }
+
+   z = first_zones_zonelist(node_zonelist(numa_node_id(), 
GFP_HIGHUSER),
+gfp_zone(GFP_HIGHUSER), 
>nodes);
+   hpage_node = zone_to_nid(z->zone);
+   }
 
nmask = policy_nodemask(gfp, pol);
if (!nmask || node_isset(hpage_node, *nmask)) {
@@ -2330,9 +2345,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
vm_area_struct *vma,
}
}
 
-   nmask = policy_nodemask(gfp, pol);
-   preferred_nid = policy_node(gfp, pol, node);
-   page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask);
+   page = alloc_pages_policy(pol, gfp, order, NUMA_NO_NODE);
mpol_cond_put(pol);
 out:
return page;
-- 
2.7.4

[PATCH v4 09/13] mm/mempolicy: Thread allocation for many preferred

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

In order to support MPOL_PREFERRED_MANY as the mode used by
set_mempolicy(2), alloc_pages_current() needs to support it. This patch
does that by using the new helper function to allocate properly based on
policy.

All the actual machinery to make this work was part of
("mm/mempolicy: Create a page allocator for policy")

Link: https://lore.kernel.org/r/20200630212517.308045-10-ben.widaw...@intel.com
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d21105b..a92efe7 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2357,7 +2357,7 @@ EXPORT_SYMBOL(alloc_pages_vma);
 struct page *alloc_pages_current(gfp_t gfp, unsigned order)
 {
struct mempolicy *pol = _policy;
-   struct page *page;
+   int nid = NUMA_NO_NODE;
 
if (!in_interrupt() && !(gfp & __GFP_THISNODE))
pol = get_task_policy(current);
@@ -2367,14 +2367,9 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned 
order)
 * nor system default_policy
 */
if (pol->mode == MPOL_INTERLEAVE)
-   page = alloc_pages_policy(pol, gfp, order,
- interleave_nodes(pol));
-   else
-   page = __alloc_pages_nodemask(gfp, order,
-   policy_node(gfp, pol, numa_node_id()),
-   policy_nodemask(gfp, pol));
+   nid = interleave_nodes(pol);
 
-   return page;
+   return alloc_pages_policy(pol, gfp, order, nid);
 }
 EXPORT_SYMBOL(alloc_pages_current);
 
-- 
2.7.4

[PATCH v4 08/13] mm/mempolicy: Create a page allocator for policy

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

Add a helper function which takes care of handling multiple preferred
nodes. It will be called by future patches that need to handle this,
specifically VMA based page allocation, and task based page allocation.
Huge pages don't quite fit the same pattern because they use different
underlying page allocation functions. This consumes the previous
interleave policy specific allocation function to make a one stop shop
for policy based allocation.

With this, MPOL_PREFERRED_MANY's semantic is more like MPOL_PREFERRED
that it will first try the preferred node/nodes, and fallback to all
other nodes when first try fails. Thanks to Michal Hocko for suggestions
on this.

For now, only interleaved policy will be used so there should be no
functional change yet. However, if bisection points to issues in the
next few commits, it was likely the fault of this patch.

Similar functionality is offered via policy_node() and
policy_nodemask(). By themselves however, neither can achieve this
fallback style of sets of nodes.

[ Feng: for the first try, add NOWARN flag, and skip the direct reclaim
  to speedup allocation in some case ]

Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widaw...@intel.com
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 65 ++
 1 file changed, 52 insertions(+), 13 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d945f29..d21105b 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2187,22 +2187,60 @@ bool mempolicy_nodemask_intersects(struct task_struct 
*tsk,
return ret;
 }
 
-/* Allocate a page in interleaved policy.
-   Own path because it needs to do special accounting. */
-static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
-   unsigned nid)
+/* Handle page allocation for all but interleaved policies */
+static struct page *alloc_pages_policy(struct mempolicy *pol, gfp_t gfp,
+  unsigned int order, int preferred_nid)
 {
struct page *page;
+   gfp_t gfp_mask = gfp;
 
-   page = __alloc_pages(gfp, order, nid);
-   /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */
-   if (!static_branch_likely(_numa_stat_key))
+   if (pol->mode == MPOL_INTERLEAVE) {
+   page = __alloc_pages(gfp, order, preferred_nid);
+   /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is 
disabled */
+   if (!static_branch_likely(_numa_stat_key))
+   return page;
+   if (page && page_to_nid(page) == preferred_nid) {
+   preempt_disable();
+   __inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT);
+   preempt_enable();
+   }
return page;
-   if (page && page_to_nid(page) == nid) {
-   preempt_disable();
-   __inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT);
-   preempt_enable();
}
+
+   VM_BUG_ON(preferred_nid != NUMA_NO_NODE);
+
+   preferred_nid = numa_node_id();
+
+   /*
+* There is a two pass approach implemented here for
+* MPOL_PREFERRED_MANY. In the first pass we try the preferred nodes
+* but allow the allocation to fail. The below table explains how
+* this is achieved.
+*
+* | Policy| preferred nid | nodemask   |
+* |---|---||
+* | MPOL_DEFAULT  | local | NULL   |
+* | MPOL_PREFERRED| best  | NULL   |
+* | MPOL_INTERLEAVE   | ERR   | ERR|
+* | MPOL_BIND | local | pol->nodes |
+* | MPOL_PREFERRED_MANY   | best  | pol->nodes |
+* | MPOL_PREFERRED_MANY (round 2) | local | NULL   |
+* +---+---++
+*/
+   if (pol->mode == MPOL_PREFERRED_MANY) {
+   gfp_mask |=  __GFP_NOWARN;
+
+   /* Skip direct reclaim, as there will be a second try */
+   gfp_mask &= ~__GFP_DIRECT_RECLAIM;
+   }
+
+   page = __alloc_pages_nodemask(gfp_mask, order,
+ policy_node(gfp, pol, preferred_nid),
+ policy_nodemask(gfp, pol));
+
+   if (unlikely(!page && pol->mode == MPOL_PREFERRED_MANY))
+   page = __alloc_pages_nodemask(gfp, order, preferred_nid, NULL);
+
return page;
 }
 
@@ -2244,8 +2282,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
vm_area_struct *vma,
unsigned nid;
 
nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order);
+   page = alloc_pages_policy(pol, gfp, order,

[PATCH v4 07/13] mm/mempolicy: handle MPOL_PREFERRED_MANY like BIND

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

Begin the real plumbing for handling this new policy. Now that the
internal representation for preferred nodes and bound nodes is the same,
and we can envision what multiple preferred nodes will behave like,
there are obvious places where we can simply reuse the bind behavior.

In v1 of this series, the moral equivalent was:
"mm: Finish handling MPOL_PREFERRED_MANY". Like that, this attempts to
implement the easiest spots for the new policy. Unlike that, this just
reuses BIND.

Link: https://lore.kernel.org/r/20200630212517.308045-8-ben.widaw...@intel.com
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index eba207e..d945f29 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -963,8 +963,6 @@ static void get_policy_nodemask(struct mempolicy *p, 
nodemask_t *nodes)
switch (p->mode) {
case MPOL_BIND:
case MPOL_INTERLEAVE:
-   *nodes = p->nodes;
-   break;
case MPOL_PREFERRED_MANY:
*nodes = p->nodes;
break;
@@ -1928,7 +1926,8 @@ static int apply_policy_zone(struct mempolicy *policy, 
enum zone_type zone)
 nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
 {
/* Lower zones don't get a nodemask applied for MPOL_BIND */
-   if (unlikely(policy->mode == MPOL_BIND) &&
+   if (unlikely(policy->mode == MPOL_BIND ||
+policy->mode == MPOL_PREFERRED_MANY) &&
apply_policy_zone(policy, gfp_zone(gfp)) &&
cpuset_nodemask_valid_mems_allowed(>nodes))
return >nodes;
@@ -1984,7 +1983,6 @@ unsigned int mempolicy_slab_node(void)
return node;
 
switch (policy->mode) {
-   case MPOL_PREFERRED_MANY:
case MPOL_PREFERRED:
/*
 * handled MPOL_F_LOCAL above
@@ -1994,6 +1992,7 @@ unsigned int mempolicy_slab_node(void)
case MPOL_INTERLEAVE:
return interleave_nodes(policy);
 
+   case MPOL_PREFERRED_MANY:
case MPOL_BIND: {
struct zoneref *z;
 
@@ -2119,9 +2118,6 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
task_lock(current);
mempolicy = current->mempolicy;
switch (mempolicy->mode) {
-   case MPOL_PREFERRED_MANY:
-   *mask = mempolicy->nodes;
-   break;
case MPOL_PREFERRED:
if (mempolicy->flags & MPOL_F_LOCAL)
nid = numa_node_id();
@@ -2132,6 +2128,7 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
 
case MPOL_BIND:
case MPOL_INTERLEAVE:
+   case MPOL_PREFERRED_MANY:
*mask = mempolicy->nodes;
break;
 
@@ -2175,12 +2172,11 @@ bool mempolicy_nodemask_intersects(struct task_struct 
*tsk,
 * Thus, it's possible for tsk to have allocated memory from
 * nodes in mask.
 */
-   break;
-   case MPOL_PREFERRED_MANY:
ret = nodes_intersects(mempolicy->nodes, *mask);
break;
case MPOL_BIND:
case MPOL_INTERLEAVE:
+   case MPOL_PREFERRED_MANY:
ret = nodes_intersects(mempolicy->nodes, *mask);
break;
default:
@@ -2404,7 +2400,6 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy 
*b)
switch (a->mode) {
case MPOL_BIND:
case MPOL_INTERLEAVE:
-   return !!nodes_equal(a->nodes, b->nodes);
case MPOL_PREFERRED_MANY:
return !!nodes_equal(a->nodes, b->nodes);
case MPOL_PREFERRED:
@@ -2558,6 +2553,7 @@ int mpol_misplaced(struct page *page, struct 
vm_area_struct *vma, unsigned long
polnid = first_node(pol->nodes);
break;
 
+   case MPOL_PREFERRED_MANY:
case MPOL_BIND:
/* Optimize placement among multiple nodes via NUMA balancing */
if (pol->flags & MPOL_F_MORON) {
@@ -2580,8 +2576,6 @@ int mpol_misplaced(struct page *page, struct 
vm_area_struct *vma, unsigned long
polnid = zone_to_nid(z->zone);
break;
 
-   /* case MPOL_PREFERRED_MANY: */
-
default:
BUG();
}
@@ -3094,15 +3088,13 @@ void mpol_to_str(char *buffer, int maxlen, struct 
mempolicy *pol)
switch (mode) {
case MPOL_DEFAULT:
break;
-   case MPOL_PREFERRED_MANY:
-   WARN_ON(flags & MPOL_F_LOCAL);
-   fallthrough;
case MPOL_PREFERRED:
if (flags & MPOL_F_LOCAL)
mode = MPOL_LOCAL;
else
nodes_or(nodes, nodes, pol->nodes);
break;
+   case MPOL_PREFERRED_MANY:
case MPOL_BIND:
case MPOL_INTERLEAVE:
nodes =

[PATCH v4 06/13] mm/mempolicy: kill v.preferred_nodes

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

Now that preferred_nodes is just a mask, and policies are mutually
exclusive, there is no reason to have a separate mask.

This patch is optional. It definitely helps clean up code in future
patches, but there is no functional difference to leaving it with the
previous name. I do believe it helps demonstrate the exclusivity of the
fields.

Link: https://lore.kernel.org/r/20200630212517.308045-7-ben.widaw...@intel.com
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 include/linux/mempolicy.h |   6 +--
 mm/mempolicy.c| 114 ++
 2 files changed, 56 insertions(+), 64 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 23ee105..ec811c3 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -46,11 +46,7 @@ struct mempolicy {
atomic_t refcnt;
unsigned short mode;/* See MPOL_* above */
unsigned short flags;   /* See set_mempolicy() MPOL_F_* above */
-   union {
-   nodemask_t preferred_nodes; /* preferred */
-   nodemask_t nodes; /* interleave/bind */
-   /* undefined for default */
-   } v;
+   nodemask_t nodes;   /* interleave/bind/many */
union {
nodemask_t cpuset_mems_allowed; /* relative to these nodes */
nodemask_t user_nodemask;   /* nodemask passed by user */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index fbfa3ce..eba207e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -199,7 +199,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const 
nodemask_t *nodes)
 {
if (nodes_empty(*nodes))
return -EINVAL;
-   pol->v.nodes = *nodes;
+   pol->nodes = *nodes;
return 0;
 }
 
@@ -211,7 +211,7 @@ static int mpol_new_preferred_many(struct mempolicy *pol,
else if (nodes_empty(*nodes))
return -EINVAL; /*  no allowed nodes */
else
-   pol->v.preferred_nodes = *nodes;
+   pol->nodes = *nodes;
return 0;
 }
 
@@ -235,7 +235,7 @@ static int mpol_new_bind(struct mempolicy *pol, const 
nodemask_t *nodes)
 {
if (nodes_empty(*nodes))
return -EINVAL;
-   pol->v.nodes = *nodes;
+   pol->nodes = *nodes;
return 0;
 }
 
@@ -352,15 +352,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, 
const nodemask_t *nodes)
else if (pol->flags & MPOL_F_RELATIVE_NODES)
mpol_relative_nodemask(, >w.user_nodemask, nodes);
else {
-   nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed,
-   *nodes);
+   nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed,
+   *nodes);
pol->w.cpuset_mems_allowed = *nodes;
}
 
if (nodes_empty(tmp))
tmp = *nodes;
 
-   pol->v.nodes = tmp;
+   pol->nodes = tmp;
 }
 
 static void mpol_rebind_preferred_common(struct mempolicy *pol,
@@ -373,17 +373,17 @@ static void mpol_rebind_preferred_common(struct mempolicy 
*pol,
int node = first_node(pol->w.user_nodemask);
 
if (node_isset(node, *nodes)) {
-   pol->v.preferred_nodes = nodemask_of_node(node);
+   pol->nodes = nodemask_of_node(node);
pol->flags &= ~MPOL_F_LOCAL;
} else
pol->flags |= MPOL_F_LOCAL;
} else if (pol->flags & MPOL_F_RELATIVE_NODES) {
mpol_relative_nodemask(, >w.user_nodemask, nodes);
-   pol->v.preferred_nodes = tmp;
+   pol->nodes = tmp;
} else if (!(pol->flags & MPOL_F_LOCAL)) {
-   nodes_remap(tmp, pol->v.preferred_nodes,
-   pol->w.cpuset_mems_allowed, *preferred_nodes);
-   pol->v.preferred_nodes = tmp;
+   nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed,
+   *preferred_nodes);
+   pol->nodes = tmp;
pol->w.cpuset_mems_allowed = *nodes;
}
 }
@@ -963,14 +963,14 @@ static void get_policy_nodemask(struct mempolicy *p, 
nodemask_t *nodes)
switch (p->mode) {
case MPOL_BIND:
case MPOL_INTERLEAVE:
-   *nodes = p->v.nodes;
+   *nodes = p->nodes;
break;
case MPOL_PREFERRED_MANY:
-   *nodes = p->v.preferred_nodes;
+   *nodes = p->nodes;
break;
case MPOL_PREFERRED:
if (!(p->flags & MPOL_F_LOCAL))
-   *nodes = p->v.preferred_nodes;
+   *nodes = p->nodes;
/* else return empty node mask for local allocation */
break;
default:
@@ -1056,7 +1056,7 @@ static long do_get_mempolicy(int *policy, nodemask_t

Re: [PATCH 2/2] riscv: Enable generic clockevent broadcast

2021-03-16 Thread Palmer Dabbelt


On Sat, 06 Mar 2021 18:24:46 PST (-0800), guo...@kernel.org wrote:

From: Guo Ren 

When percpu-timers are stopped by deep power saving mode, we
need system timer help to broadcast IPI_TIMER.

This is first introduced by broken x86 hardware, where the local apic
timer stops in C3 state. But many other architectures(powerpc, mips,
arm, hexagon, openrisc, sh) have supported the infrastructure to
deal with Power Management issues.

Signed-off-by: Guo Ren 
Cc: Arnd Bergmann 
Cc: Thomas Gleixner 
Cc: Daniel Lezcano 
Cc: Anup Patel 
Cc: Atish Patra 
Cc: Palmer Dabbelt 
Cc: Greentime Hu 
---
 arch/riscv/Kconfig  |  2 ++
 arch/riscv/kernel/smp.c | 16 
 2 files changed, 18 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 85d626b8ce5e..8637e7344abe 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -28,6 +28,7 @@ config RISCV
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU
+   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
@@ -39,6 +40,7 @@ config RISCV
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY if SMP
select GENERIC_ATOMIC64 if !64BIT
+   select GENERIC_CLOCKEVENTS_BROADCAST if SMP
select GENERIC_EARLY_IOREMAP
select GENERIC_GETTIMEOFDAY if HAVE_GENERIC_VDSO
select GENERIC_IOREMAP
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index ea028d9e0d24..8325d33411d8 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -9,6 +9,7 @@
  */

 #include 
+#include 
 #include 
 #include 
 #include 
@@ -27,6 +28,7 @@ enum ipi_message_type {
IPI_CALL_FUNC,
IPI_CPU_STOP,
IPI_IRQ_WORK,
+   IPI_TIMER,
IPI_MAX
 };

@@ -176,6 +178,12 @@ void handle_IPI(struct pt_regs *regs)
irq_work_run();
}

+#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
+   if (ops & (1 << IPI_TIMER)) {
+   stats[IPI_TIMER]++;
+   tick_receive_broadcast();
+   }
+#endif
BUG_ON((ops >> IPI_MAX) != 0);

/* Order data access and bit testing. */
@@ -192,6 +200,7 @@ static const char * const ipi_names[] = {
[IPI_CALL_FUNC] = "Function call interrupts",
[IPI_CPU_STOP]  = "CPU stop interrupts",
[IPI_IRQ_WORK]  = "IRQ work interrupts",
+   [IPI_TIMER] = "Timer broadcast interrupts",
 };

 void show_ipi_stats(struct seq_file *p, int prec)
@@ -217,6 +226,13 @@ void arch_send_call_function_single_ipi(int cpu)
send_ipi_single(cpu, IPI_CALL_FUNC);
 }

+#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
+void tick_broadcast(const struct cpumask *mask)
+{
+   send_ipi_mask(mask, IPI_TIMER);
+}
+#endif
+
 void smp_send_stop(void)
 {
unsigned long timeout;


Thanks, this is on for-next.

[PATCH v4 03/13] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes

2021-03-16 Thread Feng Tang

From: Dave Hansen 

MPOL_PREFERRED honors only a single node set in the nodemask.  Add the
bare define for a new mode which will allow more than one.

The patch does all the plumbing without actually adding the new policy
type.

v2:
Plumb most MPOL_PREFERRED_MANY without exposing UAPI (Ben)
Fixes for checkpatch (Ben)

Link: https://lore.kernel.org/r/20200630212517.308045-4-ben.widaw...@intel.com
Co-developed-by: Ben Widawsky 
Signed-off-by: Ben Widawsky 
Signed-off-by: Dave Hansen 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2b1e0e4..1228d8e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -31,6 +31,9 @@
  *but useful to set in a VMA when you have a non default
  *process policy.
  *
+ * preferred many Try a set of nodes first before normal fallback. This is
+ *similar to preferred without the special case.
+ *
  * defaultAllocate on the local node first, or when on a VMA
  *use the process policy. This is what Linux always did
  *   in a NUMA aware kernel and still does by, ahem, default.
@@ -105,6 +108,8 @@
 
 #include "internal.h"
 
+#define MPOL_PREFERRED_MANY MPOL_MAX
+
 /* Internal flags */
 #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0)   /* Skip checks for 
continuous vmas */
 #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for 
nodemask */
@@ -175,7 +180,7 @@ struct mempolicy *get_task_policy(struct task_struct *p)
 static const struct mempolicy_operations {
int (*create)(struct mempolicy *pol, const nodemask_t *nodes);
void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes);
-} mpol_ops[MPOL_MAX];
+} mpol_ops[MPOL_MAX + 1];
 
 static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
 {
@@ -415,7 +420,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
mmap_write_unlock(mm);
 }
 
-static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
+static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = {
[MPOL_DEFAULT] = {
.rebind = mpol_rebind_default,
},
@@ -432,6 +437,10 @@ static const struct mempolicy_operations 
mpol_ops[MPOL_MAX] = {
.rebind = mpol_rebind_nodemask,
},
/* [MPOL_LOCAL] - see mpol_new() */
+   [MPOL_PREFERRED_MANY] = {
+   .create = NULL,
+   .rebind = NULL,
+   },
 };
 
 static int migrate_page_add(struct page *page, struct list_head *pagelist,
@@ -924,6 +933,9 @@ static void get_policy_nodemask(struct mempolicy *p, 
nodemask_t *nodes)
case MPOL_INTERLEAVE:
*nodes = p->v.nodes;
break;
+   case MPOL_PREFERRED_MANY:
+   *nodes = p->v.preferred_nodes;
+   break;
case MPOL_PREFERRED:
if (!(p->flags & MPOL_F_LOCAL))
*nodes = p->v.preferred_nodes;
@@ -1895,7 +1907,9 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy 
*policy)
 /* Return the node id preferred by the given mempolicy, or the given id */
 static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd)
 {
-   if (policy->mode == MPOL_PREFERRED && !(policy->flags & MPOL_F_LOCAL)) {
+   if ((policy->mode == MPOL_PREFERRED ||
+policy->mode == MPOL_PREFERRED_MANY) &&
+   !(policy->flags & MPOL_F_LOCAL)) {
nd = first_node(policy->v.preferred_nodes);
} else {
/*
@@ -1938,6 +1952,7 @@ unsigned int mempolicy_slab_node(void)
return node;
 
switch (policy->mode) {
+   case MPOL_PREFERRED_MANY:
case MPOL_PREFERRED:
/*
 * handled MPOL_F_LOCAL above
@@ -2072,6 +2087,9 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
task_lock(current);
mempolicy = current->mempolicy;
switch (mempolicy->mode) {
+   case MPOL_PREFERRED_MANY:
+   *mask = mempolicy->v.preferred_nodes;
+   break;
case MPOL_PREFERRED:
if (mempolicy->flags & MPOL_F_LOCAL)
nid = numa_node_id();
@@ -2126,6 +2144,9 @@ bool mempolicy_nodemask_intersects(struct task_struct 
*tsk,
 * nodes in mask.
 */
break;
+   case MPOL_PREFERRED_MANY:
+   ret = nodes_intersects(mempolicy->v.preferred_nodes, *mask);
+   break;
case MPOL_BIND:
case MPOL_INTERLEAVE:
ret = nodes_intersects(mempolicy->v.nodes, *mask);
@@ -2210,10 +2231,13 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
vm_area_struct *vma,
 * node and don't fall back to other nodes, as the cost of
 * remote accesses would likely offset THP benefits.
 *
-* If the policy is

[PATCH v4 04/13] mm/mempolicy: allow preferred code to take a nodemask

2021-03-16 Thread Feng Tang

From: Dave Hansen 

Create a helper function (mpol_new_preferred_many()) which is usable
both by the old, single-node MPOL_PREFERRED and the new
MPOL_PREFERRED_MANY.

Enforce the old single-node MPOL_PREFERRED behavior in the "new"
version of mpol_new_preferred() which calls mpol_new_preferred_many().

v3:
  * fix a stack overflow caused by emty nodemask (Feng)

Link: https://lore.kernel.org/r/20200630212517.308045-5-ben.widaw...@intel.com
Signed-off-by: Dave Hansen 
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1228d8e..6fb2cab 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -203,17 +203,34 @@ static int mpol_new_interleave(struct mempolicy *pol, 
const nodemask_t *nodes)
return 0;
 }
 
-static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes)
+static int mpol_new_preferred_many(struct mempolicy *pol,
+  const nodemask_t *nodes)
 {
if (!nodes)
pol->flags |= MPOL_F_LOCAL; /* local allocation */
else if (nodes_empty(*nodes))
return -EINVAL; /*  no allowed nodes */
else
-   pol->v.preferred_nodes = nodemask_of_node(first_node(*nodes));
+   pol->v.preferred_nodes = *nodes;
return 0;
 }
 
+static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes)
+{
+   if (nodes) {
+   /* MPOL_PREFERRED can only take a single node: */
+   nodemask_t tmp;
+
+   if (nodes_empty(*nodes))
+   return -EINVAL;
+
+   tmp = nodemask_of_node(first_node(*nodes));
+   return mpol_new_preferred_many(pol, );
+   }
+
+   return mpol_new_preferred_many(pol, NULL);
+}
+
 static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes)
 {
if (nodes_empty(*nodes))
-- 
2.7.4

[PATCH v4 05/13] mm/mempolicy: refactor rebind code for PREFERRED_MANY

2021-03-16 Thread Feng Tang

From: Dave Hansen 

Again, this extracts the "only one node must be set" behavior of
MPOL_PREFERRED.  It retains virtually all of the existing code so it can
be used by MPOL_PREFERRED_MANY as well.

v2:
Fixed typos in commit message. (Ben)
Merged bits from other patches. (Ben)
annotate mpol_rebind_preferred_many as unused (Ben)

Link: https://lore.kernel.org/r/20200630212517.308045-6-ben.widaw...@intel.com
Signed-off-by: Dave Hansen 
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 6fb2cab..fbfa3ce 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -363,14 +363,11 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, 
const nodemask_t *nodes)
pol->v.nodes = tmp;
 }
 
-static void mpol_rebind_preferred(struct mempolicy *pol,
-   const nodemask_t *nodes)
+static void mpol_rebind_preferred_common(struct mempolicy *pol,
+const nodemask_t *preferred_nodes,
+const nodemask_t *nodes)
 {
nodemask_t tmp;
-   nodemask_t preferred_node;
-
-   /* MPOL_PREFERRED uses only the first node in the mask */
-   preferred_node = nodemask_of_node(first_node(*nodes));
 
if (pol->flags & MPOL_F_STATIC_NODES) {
int node = first_node(pol->w.user_nodemask);
@@ -385,12 +382,30 @@ static void mpol_rebind_preferred(struct mempolicy *pol,
pol->v.preferred_nodes = tmp;
} else if (!(pol->flags & MPOL_F_LOCAL)) {
nodes_remap(tmp, pol->v.preferred_nodes,
-   pol->w.cpuset_mems_allowed, preferred_node);
+   pol->w.cpuset_mems_allowed, *preferred_nodes);
pol->v.preferred_nodes = tmp;
pol->w.cpuset_mems_allowed = *nodes;
}
 }
 
+/* MPOL_PREFERRED_MANY allows multiple nodes to be set in 'nodes' */
+static void __maybe_unused mpol_rebind_preferred_many(struct mempolicy *pol,
+ const nodemask_t *nodes)
+{
+   mpol_rebind_preferred_common(pol, nodes, nodes);
+}
+
+static void mpol_rebind_preferred(struct mempolicy *pol,
+ const nodemask_t *nodes)
+{
+   nodemask_t preferred_node;
+
+   /* MPOL_PREFERRED uses only the first node in 'nodes' */
+   preferred_node = nodemask_of_node(first_node(*nodes));
+
+   mpol_rebind_preferred_common(pol, _node, nodes);
+}
+
 /*
  * mpol_rebind_policy - Migrate a policy to a different set of nodes
  *
-- 
2.7.4

[PATCH v4 02/13] mm/mempolicy: convert single preferred_node to full nodemask

2021-03-16 Thread Feng Tang

From: Dave Hansen 

The NUMA APIs currently allow passing in a "preferred node" as a
single bit set in a nodemask.  If more than one bit it set, bits
after the first are ignored.  Internally, this is implemented as
a single integer: mempolicy->preferred_node.

This single node is generally OK for location-based NUMA where
memory being allocated will eventually be operated on by a single
CPU.  However, in systems with multiple memory types, folks want
to target a *type* of memory instead of a location.  For instance,
someone might want some high-bandwidth memory but do not care about
the CPU next to which it is allocated.  Or, they want a cheap,
high capacity allocation and want to target all NUMA nodes which
have persistent memory in volatile mode.  In both of these cases,
the application wants to target a *set* of nodes, but does not
want strict MPOL_BIND behavior as that could lead to OOM killer or
SIGSEGV.

To get that behavior, a MPOL_PREFERRED mode is desirable, but one
that honors multiple nodes to be set in the nodemask.

The first step in that direction is to be able to internally store
multiple preferred nodes, which is implemented in this patch.

This should not have any function changes and just switches the
internal representation of mempolicy->preferred_node from an
integer to a nodemask called 'mempolicy->preferred_nodes'.

This is not a pie-in-the-sky dream for an API.  This was a response to a
specific ask of more than one group at Intel.  Specifically:

1. There are existing libraries that target memory types such as
   https://github.com/memkind/memkind.  These are known to suffer
   from SIGSEGV's when memory is low on targeted memory "kinds" that
   span more than one node.  The MCDRAM on a Xeon Phi in "Cluster on
   Die" mode is an example of this.
2. Volatile-use persistent memory users want to have a memory policy
   which is targeted at either "cheap and slow" (PMEM) or "expensive and
   fast" (DRAM).  However, they do not want to experience allocation
   failures when the targeted type is unavailable.
3. Allocate-then-run.  Generally, we let the process scheduler decide
   on which physical CPU to run a task.  That location provides a
   default allocation policy, and memory availability is not generally
   considered when placing tasks.  For situations where memory is
   valuable and constrained, some users want to allocate memory first,
   *then* allocate close compute resources to the allocation.  This is
   the reverse of the normal (CPU) model.  Accelerators such as GPUs
   that operate on core-mm-managed memory are interested in this model.

v2:
Fix spelling errors in commit message. (Ben)
clang-format. (Ben)
Integrated bit from another patch. (Ben)
Update the docs to reflect the internal data structure change (Ben)
Don't advertise MPOL_PREFERRED_MANY in UAPI until we can handle it (Ben)
Added more to the commit message (Dave)

Link: https://lore.kernel.org/r/20200630212517.308045-3-ben.widaw...@intel.com
Co-developed-by: Ben Widawsky 
Signed-off-by: Dave Hansen 
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 .../admin-guide/mm/numa_memory_policy.rst  |  6 ++--
 include/linux/mempolicy.h  |  4 +--
 mm/mempolicy.c | 40 --
 3 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst 
b/Documentation/admin-guide/mm/numa_memory_policy.rst
index 067a90a..1ad020c 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -205,9 +205,9 @@ MPOL_PREFERRED
of increasing distance from the preferred node based on
information provided by the platform firmware.
 
-   Internally, the Preferred policy uses a single node--the
-   preferred_node member of struct mempolicy.  When the internal
-   mode flag MPOL_F_LOCAL is set, the preferred_node is ignored
+   Internally, the Preferred policy uses a nodemask--the
+   preferred_nodes member of struct mempolicy.  When the internal
+   mode flag MPOL_F_LOCAL is set, the preferred_nodes are ignored
and the policy is interpreted as local allocation.  "Local"
allocation policy can be viewed as a Preferred policy that
starts at the node containing the cpu where the allocation
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 5f1c74d..23ee105 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -47,8 +47,8 @@ struct mempolicy {
unsigned short mode;/* See MPOL_* above */
unsigned short flags;   /* See set_mempolicy() MPOL_F_* above */
union {
-   shortpreferred_node; /* preferred */
-   nodemask_t   nodes; /* interleave/bind */
+   nodemask_t preferred_nodes; /* preferred */
+   nodemask_t nodes; /* interleave/bind */

[PATCH v4 00/13] Introduced multi-preference mempolicy

2021-03-16 Thread Feng Tang

This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy.
This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2)
interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a
preference for nodes which will fulfil memory allocation requests. Unlike the
MPOL_PREFERRED mode, it takes a set of nodes. Like the MPOL_BIND interface, it
works over a set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or
invoke the OOM killer if those preferred nodes are not available.

Along with these patches are patches for libnuma, numactl, numademo, and memhog.
They still need some polish, but can be found here:
https://gitlab.com/bwidawsk/numactl/-/tree/prefer-many
It allows new usage: `numactl -P 0,3,4`

The goal of the new mode is to enable some use-cases when using tiered memory
usage models which I've lovingly named.
1a. The Hare - The interconnect is fast enough to meet bandwidth and latency
requirements allowing preference to be given to all nodes with "fast" memory.
1b. The Indiscriminate Hare - An application knows it wants fast memory (or
perhaps slow memory), but doesn't care which node it runs on. The application
can prefer a set of nodes and then xpu bind to the local node (cpu, accelerator,
etc). This reverses the nodes are chosen today where the kernel attempts to use
local memory to the CPU whenever possible. This will attempt to use the local
accelerator to the memory.
2. The Tortoise - The administrator (or the application itself) is aware it only
needs slow memory, and so can prefer that.

Much of this is almost achievable with the bind interface, but the bind
interface suffers from an inability to fallback to another set of nodes if
binding fails to all nodes in the nodemask.

Like MPOL_BIND a nodemask is given. Inherently this removes ordering from the
preference.

> /* Set first two nodes as preferred in an 8 node system. */
> const unsigned long nodes = 0x3
> set_mempolicy(MPOL_PREFER_MANY, , 8);

> /* Mimic interleave policy, but have fallback *.
> const unsigned long nodes = 0xaa
> set_mempolicy(MPOL_PREFER_MANY, , 8);

Some internal discussion took place around the interface. There are two
alternatives which we have discussed, plus one I stuck in:
1. Ordered list of nodes. Currently it's believed that the added complexity is
   nod needed for expected usecases.
2. A flag for bind to allow falling back to other nodes. This confuses the
   notion of binding and is less flexible than the current solution.
3. Create flags or new modes that helps with some ordering. This offers both a
   friendlier API as well as a solution for more customized usage. It's unknown
   if it's worth the complexity to support this. Here is sample code for how
   this might work:

> // Prefer specific nodes for some something wacky
> set_mempolicy(MPOL_PREFER_MANY, 0x17c, 1024);
>
> // Default
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_SOCKET, NULL, 0);
> // which is the same as
> set_mempolicy(MPOL_DEFAULT, NULL, 0);
>
> // The Hare
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, NULL, 0);
>
> // The Tortoise
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE_REV, NULL, 0);
>
> // Prefer the fast memory of the first two sockets
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, -1, 2);
>

In v1, Andi Kleen brought up reusing MPOL_PREFERRED as the mode for the API.
There wasn't consensus around this, so I've left the existing API as it was. I'm
open to more feedback here, but my slight preference is to use a new API as it
ensures if people are using it, they are entirely aware of what they're doing
and not accidentally misusing the old interface. (In a similar way to how
MPOL_LOCAL was introduced).

In v1, Michal also brought up renaming this MPOL_PREFERRED_MASK. I'm equally
fine with that change, but I hadn't heard much emphatic support for one way or
another, so I've left that too.

Changelog: 

  Since v3:
  * Rebased against v5.12-rc2
  * Drop the v3/0013 patch of creating NO_SLOWPATH gfp_mask bit
  * Skip direct reclaim for the first allocation try for
MPOL_PREFERRED_MANY, which makes its semantics close to
existing MPOL_PREFFERRED policy

  Since v2:
  * Rebased against v5.11
  * Fix a stack overflow related panic, and a kernel warning (Feng)
  * Some code clearup (Feng)
  * One RFC patch to speedup mem alloc in some case (Feng)

  Since v1:
  * Dropped patch to replace numa_node_id in some places (mhocko)
  * Dropped all the page allocation patches in favor of new mechanism to
use fallbacks. (mhocko)
  * Dropped the special snowflake preferred node algorithm (bwidawsk)
  * If the preferred node fails, ALL nodes are rechecked instead of just
the non-preferred nodes.

v4 Summary:
1: Random fix I found along the way
2-5: Represent node preference as a mask internally
6-7: Tread many preferred like bind
8-11: Handle page allocation for the new policy
12: Enable the uapi
13: unifiy 2 functions

[PATCH v4 01/13] mm/mempolicy: Add comment for missing LOCAL

2021-03-16 Thread Feng Tang

From: Ben Widawsky 

MPOL_LOCAL is a bit weird because it is simply a different name for an
existing behavior (preferred policy with no node mask). It has been this
way since it was added here:
commit 479e2802d09f ("mm: mempolicy: Make MPOL_LOCAL a real policy")

It is so similar to MPOL_PREFERRED in fact that when the policy is
created in mpol_new, the mode is set as PREFERRED, and an internal state
representing LOCAL doesn't exist.

To prevent future explorers from scratching their head as to why
MPOL_LOCAL isn't defined in the mpol_ops table, add a small comment
explaining the situations.

v2:
Change comment to refer to mpol_new (Michal)

Link: https://lore.kernel.org/r/20200630212517.308045-2-ben.widaw...@intel.com
Acked-by: Michal Hocko 
Signed-off-by: Ben Widawsky 
Signed-off-by: Feng Tang 
---
 mm/mempolicy.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ab51132..4193566 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -427,6 +427,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] 
= {
.create = mpol_new_bind,
.rebind = mpol_rebind_nodemask,
},
+   /* [MPOL_LOCAL] - see mpol_new() */
 };
 
 static int migrate_page_add(struct page *page, struct list_head *pagelist,
-- 
2.7.4

Re: [PATCH] scsi: ufs-pci: Add support for Intel LKF

2021-03-16 Thread Martin K. Petersen



Adrian,

> Add PCI ID and callbacks to support Intel LKF.

Applied to 5.13/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH] rxrpc: rxkad: replace if (cond) BUG() with BUG_ON()

2021-03-16 Thread Jiapeng Chong

Fix the following coccicheck warnings:

./net/rxrpc/rxkad.c:1140:2-5: WARNING: Use BUG_ON instead of if
condition followed by BUG.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 net/rxrpc/rxkad.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index e2e9e9b..bfa3d9a 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -1135,9 +1135,8 @@ static void rxkad_decrypt_response(struct 
rxrpc_connection *conn,
   ntohl(session_key->n[0]), ntohl(session_key->n[1]));
 
mutex_lock(_ci_mutex);
-   if (crypto_sync_skcipher_setkey(rxkad_ci, session_key->x,
-   sizeof(*session_key)) < 0)
-   BUG();
+   BUG_ON(crypto_sync_skcipher_setkey(rxkad_ci, session_key->x,
+   sizeof(*session_key)) < 0);
 
memcpy(, session_key, sizeof(iv));
 
-- 
1.8.3.1

Re: [PATCH] powerpc: arch/powerpc/kernel/setup_64.c - cleanup warnings

2021-03-16 Thread heying (H)


Thank you for your reply.


在 2021/3/17 11:04, Daniel Axtens 写道:

Hi He Ying,

Thank you for this patch.

I'm not sure what the precise rules for Fixes are, but I wonder if this
should have:

Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses")
Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry")


Is that necessary for warning cleanups? I thought 'Fixes' tags are 
needed only for


bugfix patches. Can someone tell me whether I am right?



Those are the commits that added the entry_flush and uaccess_flush
symbols. Perhaps one for rfi_flush too but I'm not sure what commit
introduced that.

Kind regards,
Daniel


warning: symbol 'rfi_flush' was not declared.
warning: symbol 'entry_flush' was not declared.
warning: symbol 'uaccess_flush' was not declared.
We found warnings above in arch/powerpc/kernel/setup_64.c by using
sparse tool.

Define 'entry_flush' and 'uaccess_flush' as static because they are not
referenced outside the file. Include asm/security_features.h in which
'rfi_flush' is declared.

Reported-by: Hulk Robot 
Signed-off-by: He Ying 
---
  arch/powerpc/kernel/setup_64.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 560ed8b975e7..f92d72a7e7ce 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -68,6 +68,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "setup.h"
  
@@ -949,8 +950,8 @@ static bool no_rfi_flush;

  static bool no_entry_flush;
  static bool no_uaccess_flush;
  bool rfi_flush;
-bool entry_flush;
-bool uaccess_flush;
+static bool entry_flush;
+static bool uaccess_flush;
  DEFINE_STATIC_KEY_FALSE(uaccess_flush_key);
  EXPORT_SYMBOL(uaccess_flush_key);
  
--

2.17.1

.

Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list

2021-03-16 Thread Huang, Ying

Yu Zhao  writes:

> On Tue, Mar 16, 2021 at 02:44:31PM +0800, Huang, Ying wrote:
>> Yu Zhao  writes:
>> 
>> > On Tue, Mar 16, 2021 at 10:07:36AM +0800, Huang, Ying wrote:
>> >> Rik van Riel  writes:
>> >> 
>> >> > On Sat, 2021-03-13 at 00:57 -0700, Yu Zhao wrote:
>> >> >
>> >> >> +/*
>> >> >> + * After pages are faulted in, they become the youngest generation.
>> >> >> They must
>> >> >> + * go through aging process twice before they can be evicted. After
>> >> >> first scan,
>> >> >> + * their accessed bit set during initial faults are cleared and they
>> >> >> become the
>> >> >> + * second youngest generation. And second scan makes sure they
>> >> >> haven't been used
>> >> >> + * since the first.
>> >> >> + */
>> >> >
>> >> > I have to wonder if the reductions in OOM kills and 
>> >> > low-memory tab discards is due to this aging policy
>> >> > change, rather than from the switch to virtual scanning.
>> >
>> > There are no policy changes per se. The current page reclaim also
>> > scans a faulted-in page at least twice before it can reclaim it.
>> > That said, the new aging yields a better overall result because it
>> > discovers every page that has been referenced since the last scan,
>> > in addition to what Ying has mentioned. The current page scan stops
>> > stops once it finds enough candidates, which may seem more
>> > efficiently, but actually pays the price for not finding the best.
>> >
>> >> If my understanding were correct, the temperature of the processes is
>> >> considered in addition to that of the individual pages.  That is, the
>> >> pages of the processes that haven't been scheduled after the previous
>> >> scanning will not be scanned.  I guess that this helps OOM kills?
>> >
>> > Yes, that's correct.
>> >
>> >> If so, how about just take advantage of that information for OOM killing
>> >> and page reclaiming?  For example, if a process hasn't been scheduled
>> >> for long time, just reclaim its private pages.
>> >
>> > This is how it works. Pages that haven't been scanned grow older
>> > automatically because those that have been scanned will be tagged with
>> > younger generation numbers. Eviction does bucket sort based on
>> > generation numbers and attacks the oldest.
>> 
>> Sorry, my original words are misleading.  What I wanted to say was that
>> is it good enough that
>> 
>> - Do not change the core algorithm of current page reclaiming.
>> 
>> - Add some new logic to reclaim the process private pages regardless of
>>   the Accessed bits if the processes are not scheduled for some long
>>   enough time.  This can be done before the normal page reclaiming.
>
> This is a good idea, which being used on Android and Chrome OS. We
> call it per-process reclaim, and I've mentioned here:
> https://lore.kernel.org/linux-mm/ybkt6175gmmwb...@google.com/
>   On Android, our most advanced simulation that generates memory
>   pressure from realistic user behavior shows 18% fewer low-memory
>   kills, which in turn reduces cold starts by 16%. This is on top of
>   per-process reclaim, a predecessor of ``MADV_COLD`` and
>   ``MADV_PAGEOUT``, against background apps.

Thanks, now I see your improvement compared with the per-process
reclaim.  How about the per-process reclaim compared with the normal
page reclaiming for the similar test cases?

My intention behind this is that your solution includes several
improvements,

a) take advantage of scheduler information
b) more fine-grained active/inactive dividing
c) page table scanning instead of rmap

Is it possible to evaluate the benefit of the each step?  And is there
still some potential to optimize the current LRU based algorithm before
adopting a totally new algorithm?

> The patches landed not long a ago :) See mm/madvise.c

:) I'm too out-dated.

>> So this is an one small step improvement to the current page reclaiming
>> algorithm via taking advantage of the scheduler information.  It's
>> clearly not sophisticated as your new algorithm, for example, the cold
>> pages in the hot processes will not be reclaimed in this stage.  But it
>> can reduce the overhead of scanning too.
>
> The general problems with the direction of per-process reclaim:
>   1) we can't find the coldest pages, as you have mentioned.
>   2) we can't reach file pages accessed via file descriptors only,
>   especially those caching config files that were read only once.

In theory, this is possible, we can build a inode list based on the
accessing time too.  Although this may not be necessary.  We can reclaim
the read-once file cache before the per-process reclaim in theory.

>   3) we can't reclaim lru pages and slab objects proportionally and
>   therefore we leave many stale slab objects behind.
>   4) we have to be proactive, as you suggested (once again, you were
>   right), and this has a serious problem: client's battery life can
>   be affected.

Why can this not be done reactively?  We can start per-process reclaim
under memory pressure.  This has been used in

Re: [PATCH 2/4] clocksource: riscv: Using CPUHP_AP_ONLINE_DYN

2021-03-16 Thread Palmer Dabbelt


On Mon, 01 Mar 2021 06:28:20 PST (-0800), guo...@kernel.org wrote:

From: Guo Ren 

Remove RISC-V clocksource custom definitions in hotplug.h:
 - CPUHP_AP_RISCV_TIMER_STARTING

For coding convention.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: Linus Torvalds 
Cc: Anup Patel 
Cc: Christoph Hellwig 
Cc: Palmer Dabbelt 
Tested-by: Guo Ren 
Signed-off-by: Guo Ren 
Link: 
https://lore.kernel.org/lkml/CAHk-=wjM+kCsKqNdb=c0hKsv=J7-3Q1zmM15vp6_=8s5xfg...@mail.gmail.com/
---
 drivers/clocksource/timer-riscv.c | 4 ++--
 include/linux/cpuhotplug.h| 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/clocksource/timer-riscv.c 
b/drivers/clocksource/timer-riscv.c
index c51c5ed..43aee27 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -150,10 +150,10 @@ static int __init riscv_timer_init_dt(struct device_node 
*n)
return error;
}

-   error = cpuhp_setup_state(CPUHP_AP_RISCV_TIMER_STARTING,
+   error = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 "clockevents/riscv/timer:starting",
 riscv_timer_starting_cpu, riscv_timer_dying_cpu);
-   if (error)
+   if (error < 0)
pr_err("cpu hp setup state failed for RISCV timer [%d]\n",
   error);
return error;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 14f49fd..f60538b 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -130,7 +130,6 @@ enum cpuhp_state {
CPUHP_AP_MARCO_TIMER_STARTING,
CPUHP_AP_MIPS_GIC_TIMER_STARTING,
CPUHP_AP_ARC_TIMER_STARTING,
-   CPUHP_AP_RISCV_TIMER_STARTING,
CPUHP_AP_CLINT_TIMER_STARTING,
CPUHP_AP_CSKY_TIMER_STARTING,
CPUHP_AP_HYPERV_TIMER_STARTING,


Acked-by: Palmer Dabbelt 

Just like the previous one.  Presumably CLINT is ours as well?

Thanks!

Re: [PATCH 1/4] irqchip: riscv: Using CPUHP_AP_ONLINE_DYN

2021-03-16 Thread Palmer Dabbelt


On Mon, 01 Mar 2021 06:28:19 PST (-0800), guo...@kernel.org wrote:

From: Guo Ren 

Remove RISC-V irqchip custom definitions in hotplug.h:
 - CPUHP_AP_IRQ_RISCV_STARTING
 - CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING

For coding convention.

Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: Linus Torvalds 
Cc: Palmer Dabbelt 
Cc: Anup Patel 
Cc: Atish Patra 
Cc: Christoph Hellwig 
Tested-by: Guo Ren 
Signed-off-by: Guo Ren 
Link: 
https://lore.kernel.org/lkml/CAHk-=wjM+kCsKqNdb=c0hKsv=J7-3Q1zmM15vp6_=8s5xfg...@mail.gmail.com/
---
 drivers/irqchip/irq-riscv-intc.c  | 2 +-
 drivers/irqchip/irq-sifive-plic.c | 2 +-
 include/linux/cpuhotplug.h| 2 --
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
index 8017f6d..2c37f3a 100644
--- a/drivers/irqchip/irq-riscv-intc.c
+++ b/drivers/irqchip/irq-riscv-intc.c
@@ -125,7 +125,7 @@ static int __init riscv_intc_init(struct device_node *node,
return rc;
}

-   cpuhp_setup_state(CPUHP_AP_IRQ_RISCV_STARTING,
+   cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
  "irqchip/riscv/intc:starting",
  riscv_intc_cpu_starting,
  riscv_intc_cpu_dying);
diff --git a/drivers/irqchip/irq-sifive-plic.c 
b/drivers/irqchip/irq-sifive-plic.c
index 6f432d2..f499f1b 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -375,7 +375,7 @@ static int __init plic_init(struct device_node *node,
 */
handler = this_cpu_ptr(_handlers);
if (handler->present && !plic_cpuhp_setup_done) {
-   cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
+   cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
  "irqchip/sifive/plic:starting",
  plic_starting_cpu, plic_dying_cpu);
plic_cpuhp_setup_done = true;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f14adb8..14f49fd 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -103,8 +103,6 @@ enum cpuhp_state {
CPUHP_AP_IRQ_ARMADA_XP_STARTING,
CPUHP_AP_IRQ_BCM2836_STARTING,
CPUHP_AP_IRQ_MIPS_GIC_STARTING,
-   CPUHP_AP_IRQ_RISCV_STARTING,
-   CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
CPUHP_AP_ARM_MVEBU_COHERENCY,
CPUHP_AP_MICROCODE_LOADER,
CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING,


Acked-by: Palmer Dabbelt 

I'm going to assume this is going in through an irqchip tree, but LMK if you
want me to take it via mine.  This isn't really my sort of thing, so I'd prefer
at least an Ack.

Thanks!

Re: [PATCH] riscv: fix bugon.cocci warnings

2021-03-16 Thread Palmer Dabbelt


On Sun, 28 Feb 2021 03:10:22 PST (-0800), julia.law...@inria.fr wrote:

From: kernel test robot 

Use BUG_ON instead of a if condition followed by BUG.

Generated by: scripts/coccinelle/misc/bugon.cocci

Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported")
CC: Guo Ren 
Reported-by: kernel test robot 
Signed-off-by: kernel test robot 
Signed-off-by: Julia Lawall 
---

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   5695e51619745d4fe3ec2506a2f0cd982c5e27a4
commit: c22b0bcb1dd024cb9caad9230e3a387d8b061df5 riscv: Add kprobes supported
:: branch date: 3 hours ago
:: commit date: 6 weeks ago

 kprobes.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -256,8 +256,7 @@ int __kprobes kprobe_fault_handler(struc
 * normal page fault.
 */
regs->epc = (unsigned long) cur->addr;
-   if (!instruction_pointer(regs))
-   BUG();
+   BUG_ON(!instruction_pointer(regs));

if (kcb->kprobe_status == KPROBE_REENTER)
restore_previous_kprobe(kcb);


Thanks, this is on fixes.

[PATCH] mm: Typo fix in the file util.c

2021-03-16 Thread Bhaskar Chowdhury




s/condtion/condition/

Signed-off-by: Bhaskar Chowdhury 
---
 mm/util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index 54870226cea6..f85da35b50eb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -775,7 +775,7 @@ int overcommit_policy_handler(struct ctl_table *table, int 
write, void *buffer,
 * The deviation of sync_overcommit_as could be big with loose policy
 * like OVERCOMMIT_ALWAYS/OVERCOMMIT_GUESS. When changing policy to
 * strict OVERCOMMIT_NEVER, we need to reduce the deviation to comply
-* with the strict "NEVER", and to avoid possible race condtion (even
+* with the strict "NEVER", and to avoid possible race condition (even
 * though user usually won't too frequently do the switching to policy
 * OVERCOMMIT_NEVER), the switch is done in the following order:
 *  1. changing the batch
--
2.30.2

Re: [PATCH 5/9] objtool: Rework rebuild_reloc logic

2021-03-16 Thread Josh Poimboeuf

On Fri, Mar 12, 2021 at 06:16:18PM +0100, Peter Zijlstra wrote:
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
> @@ -479,6 +479,8 @@ void elf_add_reloc(struct elf *elf, stru
>  
>   list_add_tail(>list, >reloc_list);
>   elf_hash_add(elf->reloc_hash, >hash, reloc_hash(reloc));
> +
> + sec->rereloc = true;
>  }

Can we just reuse sec->changed for this?  Something like this on top
(untested of course):

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index addcec88ac9f..b9cb74a54681 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -480,7 +480,7 @@ void elf_add_reloc(struct elf *elf, struct reloc *reloc)
list_add_tail(>list, >reloc_list);
elf_hash_add(elf->reloc_hash, >hash, reloc_hash(reloc));
 
-   sec->rereloc = true;
+   sec->changed = true;
 }
 
 static int read_rel_reloc(struct section *sec, int i, struct reloc *reloc, 
unsigned int *symndx)
@@ -882,9 +882,6 @@ int elf_rebuild_reloc_section(struct elf *elf, struct 
section *sec)
struct reloc *reloc;
int nr;
 
-   sec->changed = true;
-   elf->changed = true;
-
nr = 0;
list_for_each_entry(reloc, >reloc_list, list)
nr++;
@@ -894,8 +891,6 @@ int elf_rebuild_reloc_section(struct elf *elf, struct 
section *sec)
case SHT_RELA: return elf_rebuild_rela_reloc_section(sec, nr);
default:   return -1;
}
-
-   sec->rereloc = false;
 }
 
 int elf_write_insn(struct elf *elf, struct section *sec,
@@ -950,14 +945,15 @@ int elf_write(struct elf *elf)
struct section *sec;
Elf_Scn *s;
 
-   list_for_each_entry(sec, >sections, list) {
-   if (sec->reloc && sec->reloc->rereloc)
-   elf_rebuild_reloc_section(elf, sec->reloc);
-   }
-
-   /* Update section headers for changed sections: */
+   /* Update changed relocation sections and section headers: */
list_for_each_entry(sec, >sections, list) {
if (sec->changed) {
+   if (sec->reloc &&
+   elf_rebuild_reloc_section(elf, sec->reloc)) {
+   WARN_ELF("elf_rebuild_reloc_section");
+   return -1;
+   }
+
s = elf_getscn(elf->elf, sec->idx);
if (!s) {
WARN_ELF("elf_getscn");
@@ -969,6 +965,7 @@ int elf_write(struct elf *elf)
}
 
sec->changed = false;
+   elf->changed = true;
}
}
 
diff --git a/tools/objtool/include/objtool/elf.h 
b/tools/objtool/include/objtool/elf.h
index 9fdd4c5f9f32..e6890cc70a25 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -39,7 +39,7 @@ struct section {
char *name;
int idx;
unsigned int len;
-   bool changed, text, rodata, noinstr, rereloc;
+   bool changed, text, rodata, noinstr;
 };
 
 struct symbol {

Re: [PATCH][next] scsi: mpt3sas: Replace unnecessary dynamic allocation with a static one

2021-03-16 Thread Martin K. Petersen



Gustavo,

> Dynamic memory allocation isn't actually needed and it can be replaced
> by statically allocating memory for struct object io_unit_pg3 with 36
> hardcoded entries for its GPIOVal array.

Applied to 5.13/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH v3] Staging: rtl8192u: fixed a whitespace coding style issue

2021-03-16 Thread zhaoxiao

Removed additional whitespaces in the r8192U_wx.c file.

Signed-off-by: zhaoxiao 
---
v3: add the specify a description of why the patch is needed. 
 drivers/staging/rtl8192u/r8192U_wx.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_wx.c 
b/drivers/staging/rtl8192u/r8192U_wx.c
index 5211b2005763..e916cf3ea74c 100644
--- a/drivers/staging/rtl8192u/r8192U_wx.c
+++ b/drivers/staging/rtl8192u/r8192U_wx.c
@@ -879,12 +879,10 @@ static iw_handler r8192_wx_handlers[] = {
 
 
 static const struct iw_priv_args r8192_private_args[] = {
-
{
SIOCIWFIRSTPRIV + 0x0,
IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, 0, "badcrc"
},
-
{
SIOCIWFIRSTPRIV + 0x1,
IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, 0, "activescan"
@@ -897,9 +895,7 @@ static const struct iw_priv_args r8192_private_args[] = {
{
SIOCIWFIRSTPRIV + 0x3,
IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, 0, "forcereset"
-
}
-
 };
 
 static iw_handler r8192_private_handler[] = {
-- 
2.20.1

[PATCH] scsi: ufs: Add selector to ufshcd_query_flag* APIs

2021-03-16 Thread Daejun Park

Unlike other query APIs in UFS, ufshcd_query_flag has a fixed selector
as 0. This patch allows ufshcd_query_flag API to choose selector value
by parameter.

Signed-off-by: Daejun Park 
---
 drivers/scsi/ufs/ufs-sysfs.c |  2 +-
 drivers/scsi/ufs/ufshcd.c| 29 +
 drivers/scsi/ufs/ufshcd.h|  2 +-
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index acc54f530f2d..606b058a3394 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -746,7 +746,7 @@ static ssize_t _name##_show(struct device *dev, 
\
index = ufshcd_wb_get_query_index(hba); \
pm_runtime_get_sync(hba->dev);  \
ret = ufshcd_query_flag(hba, UPIU_QUERY_OPCODE_READ_FLAG,   \
-   QUERY_FLAG_IDN##_uname, index, );  \
+   QUERY_FLAG_IDN##_uname, index, , 0);   \
pm_runtime_put_sync(hba->dev);  \
if (ret) {  \
ret = -EINVAL;  \
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 8c0ff024231c..c2fd9c58d6b8 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2940,13 +2940,15 @@ static inline void ufshcd_init_query(struct ufs_hba 
*hba,
 }
 
 static int ufshcd_query_flag_retry(struct ufs_hba *hba,
-   enum query_opcode opcode, enum flag_idn idn, u8 index, bool *flag_res)
+   enum query_opcode opcode, enum flag_idn idn, u8 index, bool *flag_res,
+   u8 selector)
 {
int ret;
int retries;
 
for (retries = 0; retries < QUERY_REQ_RETRIES; retries++) {
-   ret = ufshcd_query_flag(hba, opcode, idn, index, flag_res);
+   ret = ufshcd_query_flag(hba, opcode, idn, index, flag_res,
+   selector);
if (ret)
dev_dbg(hba->dev,
"%s: failed with error %d, retries %d\n",
@@ -2969,15 +2971,17 @@ static int ufshcd_query_flag_retry(struct ufs_hba *hba,
  * @idn: flag idn to access
  * @index: flag index to access
  * @flag_res: the flag value after the query request completes
+ * @selector: selector field
  *
  * Returns 0 for success, non-zero in case of failure
  */
 int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
-   enum flag_idn idn, u8 index, bool *flag_res)
+   enum flag_idn idn, u8 index, bool *flag_res,
+   u8 selector)
 {
struct ufs_query_req *request = NULL;
struct ufs_query_res *response = NULL;
-   int err, selector = 0;
+   int err;
int timeout = QUERY_REQ_TIMEOUT;
 
BUG_ON(!hba);
@@ -4331,7 +4335,7 @@ static int ufshcd_complete_dev_init(struct ufs_hba *hba)
ktime_t timeout;
 
err = ufshcd_query_flag_retry(hba, UPIU_QUERY_OPCODE_SET_FLAG,
-   QUERY_FLAG_IDN_FDEVICEINIT, 0, NULL);
+   QUERY_FLAG_IDN_FDEVICEINIT, 0, NULL, 0);
if (err) {
dev_err(hba->dev,
"%s setting fDeviceInit flag failed with error %d\n",
@@ -4343,7 +4347,8 @@ static int ufshcd_complete_dev_init(struct ufs_hba *hba)
timeout = ktime_add_ms(ktime_get(), FDEVICEINIT_COMPL_TIMEOUT);
do {
err = ufshcd_query_flag(hba, UPIU_QUERY_OPCODE_READ_FLAG,
-   QUERY_FLAG_IDN_FDEVICEINIT, 0, 
_res);
+   QUERY_FLAG_IDN_FDEVICEINIT, 0, 
_res,
+   0);
if (!flag_res)
break;
usleep_range(5000, 1);
@@ -5250,7 +5255,7 @@ static int ufshcd_enable_auto_bkops(struct ufs_hba *hba)
goto out;
 
err = ufshcd_query_flag_retry(hba, UPIU_QUERY_OPCODE_SET_FLAG,
-   QUERY_FLAG_IDN_BKOPS_EN, 0, NULL);
+   QUERY_FLAG_IDN_BKOPS_EN, 0, NULL, 0);
if (err) {
dev_err(hba->dev, "%s: failed to enable bkops %d\n",
__func__, err);
@@ -5300,7 +5305,7 @@ static int ufshcd_disable_auto_bkops(struct ufs_hba *hba)
}
 
err = ufshcd_query_flag_retry(hba, UPIU_QUERY_OPCODE_CLEAR_FLAG,
-   QUERY_FLAG_IDN_BKOPS_EN, 0, NULL);
+   QUERY_FLAG_IDN_BKOPS_EN, 0, NULL, 0);
if (err) {
dev_err(hba->dev, "%s: failed to disable bkops %d\n",
__func__, err);
@@ -5463,7 +5468,7 @@ int ufshcd_wb_ctrl(struct ufs_hba *hba, bool enable)
 
index = ufshcd_wb_get_query_index(hba);
ret = ufshcd_query_flag_retry(hba, opcode,
-

Re: [PATCH v2] scsi: ufs: sysfs: Print string descriptors as raw data

2021-03-16 Thread Bart Van Assche

On 2/15/21 9:40 AM, Arthur Simchaev wrote:
> -#define UFS_STRING_DESCRIPTOR(_name, _pname) \
> +#define UFS_STRING_DESCRIPTOR(_name, _pname, _is_ascii)  \
>  static ssize_t _name##_show(struct device *dev,  
> \
>   struct device_attribute *attr, char *buf)   \
>  {\
> @@ -690,10 +690,18 @@ static ssize_t _name##_show(struct device *dev, 
> \
>   kfree(desc_buf);\
>   desc_buf = NULL;\
>   ret = ufshcd_read_string_desc(hba, index, _buf,\
> -   SD_ASCII_STD);\
> +   _is_ascii);   \
>   if (ret < 0)\
>   goto out;   \
> - ret = sysfs_emit(buf, "%s\n", desc_buf);\
> + if (_is_ascii) {\
> + ret = sysfs_emit(buf, "%s\n", desc_buf);\
> + } else {\
> + int i;  \
> + \
> + for (i = 0; i < desc_buf[0]; i++)   \
> + hex_byte_pack(buf + i * 2, desc_buf[i]);\
> + ret = sysfs_emit(buf, "%s\n", buf); \
> + }   \
>  out: \
>   pm_runtime_put_sync(hba->dev);  \
>   kfree(desc_buf);\

Hex data needs to be parsed before it can be used by any software. Has
it been considered to make the "raw" attributes binary attributes
instead of hex-encoded binary? See also sysfs_create_bin_file().

Thanks,

Bart.

[PATCH v2] sched: replace if (cond) BUG() with BUG_ON()

2021-03-16 Thread Jiapeng Chong

Fix the following coccicheck warnings:

./kernel/sched/core.c:8039:2-5: WARNING: Use BUG_ON instead of if
condition followed by BUG.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
Changes in v2:
  - Replace BUG with BUG_ON.

 kernel/sched/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9819121..052f290 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8035,8 +8035,7 @@ void __init sched_init_smp(void)
mutex_unlock(_domains_mutex);
 
/* Move init over to a non-isolated CPU */
-   if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) 
< 0)
-   BUG();
+   BUG_ON(set_cpus_allowed_ptr(current, 
housekeeping_cpumask(HK_FLAG_DOMAIN)) < 0);
sched_init_granularity();
 
init_sched_rt_class();
-- 
1.8.3.1

Re: [PATCH 0/3] AM64: Add SERDES DT bindings

2021-03-16 Thread Kishon Vijay Abraham I

Hi Vinod,

On 10/03/21 4:57 pm, Kishon Vijay Abraham I wrote:
> Patch series adds device tree bindings to support SERDES in AM64
> platform.
> 
> This is split from [1] since this binding is also required for AM64
> USB DT patches to be merged.
> 
> Vinod,
> 
> Once the 1st patch of the series is reviewed by Rob, can you merge and
> prepare a immutable tag to be used by Nishant Menon so that he can merge
> USB3 DT patches.

Now that Rob has Acked the 1st patch, can you prepare an immutable tag
for Nishant Menon on this series.

AM64 SERDES driver changes [1] can also be merged after this.

Thank You
Kishon

[1] -> http://lore.kernel.org/r/20210310120840.16447-1-kis...@ti.com
> 
> Changes from [1]:
> *) Reverted back to adding compatible under enum.
> 
> [1] -> http://lore.kernel.org/r/20210222112314.10772-1-kis...@ti.com
> 
> Kishon Vijay Abraham I (3):
>   dt-bindings: phy: ti,phy-j721e-wiz: Add bindings for AM64 SERDES
> Wrapper
>   dt-bindings: phy: cadence-torrent: Add binding for refclk driver
>   dt-bindings: ti-serdes-mux: Add defines for AM64 SoC
> 
>  .../bindings/phy/phy-cadence-torrent.yaml | 20 +++---
>  .../bindings/phy/ti,phy-j721e-wiz.yaml|  4 
>  include/dt-bindings/mux/ti-serdes.h   |  5 +
>  include/dt-bindings/phy/phy-cadence-torrent.h |  2 ++
>  include/dt-bindings/phy/phy-ti.h  | 21 +++
>  5 files changed, 49 insertions(+), 3 deletions(-)
>  create mode 100644 include/dt-bindings/phy/phy-ti.h
>

[PATCH] kernel: Fix a typo in the file up.c

2021-03-16 Thread Bhaskar Chowdhury



s/condtions/conditions/

Signed-off-by: Bhaskar Chowdhury 
---
 Adding Andrew in the to list, becasue this file has no maintainer attached

 kernel/up.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/up.c b/kernel/up.c
index c6f323dcd45b..1b9b135e77dd 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -67,7 +67,7 @@ EXPORT_SYMBOL(on_each_cpu_mask);

 /*
  * Preemption is disabled here to make sure the cond_func is called under the
- * same condtions in UP and SMP.
+ * same conditions in UP and SMP.
  */
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
   void *info, bool wait, const struct cpumask *mask)
--
2.30.2

Re: v3: scsi: ufshcd: use a macro for UFS versions

2021-03-16 Thread Martin K. Petersen



Caleb,

> When using a device with UFS > 2.1 the error "invalid UFS version" is
> misleadingly printed. There was a patch for this almost a year
> ago to which this solution was suggested.

Applied to 5.13/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 4/9] objtool: Fix static_call list generation

2021-03-16 Thread Josh Poimboeuf

On Fri, Mar 12, 2021 at 06:16:17PM +0100, Peter Zijlstra wrote:
> @@ -1701,6 +1706,9 @@ static int decode_sections(struct objtoo
>   if (ret)
>   return ret;
>  
> + /*
> +  * Must be before add_{jump_call}_desetination.
> +  */

s/desetination/destination/

-- 
Josh

A problem of Intel IOMMU hardware ？

2021-03-16 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)

Hi guys,

We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a special
situation, it would cause DMA fails or get wrong data.

The reproducer (based on Alex's vfio testsuite[1]) is in attachment, it can
reproduce the problem with high probability (~50%).

The machine we used is:
processor   : 47
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
stepping: 4
microcode   : 0x269

And the iommu capability reported is:
ver 1:0 cap 8d2078c106f0466 ecap f020df
(caching mode = 0 , page-selective invalidation = 1)

(The problem is also on 'Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz' and
'Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz')

We run the reproducer on Linux 4.18 and it works as follow:

Step 1. alloc 4G *2M-hugetlb* memory (N.B. no problem with 4K-page mapping)
Step 2. DMA Map 4G memory
Step 3.
while (1) {
{UNMAP, 0x0, 0xa},  (a)
{UNMAP, 0xc, 0xbff4},
{MAP,   0x0, 0xc000}, - (b)
use GDB to pause at here, and then DMA read IOVA=0,
sometimes DMA success (as expected),
but sometimes DMA error (report not-present).
{UNMAP, 0x0, 0xc000}, - (c)
{MAP,   0x0, 0xa},
{MAP,   0xc, 0xbff4},
}

The DMA read operations sholud success between (b) and (c), it should NOT report
not-present at least!

After analysis the problem, we think maybe it's caused by the Intel iommu iotlb.
It seems the DMA Remapping hardware still uses the IOTLB or other caches of (a).

When do DMA unmap at (a), the iotlb will be flush:
intel_iommu_unmap
domain_unmap
iommu_flush_iotlb_psi

When do DMA map at (b), no need to flush the iotlb according to the capability
of this iommu:
intel_iommu_map
domain_pfn_mapping
domain_mapping
__mapping_notify_one
if (cap_caching_mode(iommu->cap)) // FALSE
iommu_flush_iotlb_psi
But the problem will disappear if we FORCE flush here. So we suspect the iommu
hardware.

Do you have any suggestion ?







/*
 * VFIO API definition
 *
 * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
 * Author: Alex Williamson 
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as
 * published by the Free Software Foundation.
 */
#ifndef _UAPIVFIO_H
#define _UAPIVFIO_H

#include 
#include 

#define VFIO_API_VERSION0


/* Kernel & User level defines for VFIO IOCTLs. */

/* Extensions */

#define VFIO_TYPE1_IOMMU1

/*
 * The IOCTL interface is designed for extensibility by embedding the
 * structure length (argsz) and flags into structures passed between
 * kernel and userspace.  We therefore use the _IO() macro for these
 * defines to avoid implicitly embedding a size into the ioctl request.
 * As structure fields are added, argsz will increase to match and flag
 * bits will be defined to indicate additional fields with valid data.
 * It's *always* the caller's responsibility to indicate the size of
 * the structure passed by setting argsz appropriately.
 */

#define VFIO_TYPE   (';')
#define VFIO_BASE   100

/*  IOCTLs for VFIO file descriptor (/dev/vfio/vfio)  */

/**
 * VFIO_GET_API_VERSION - _IO(VFIO_TYPE, VFIO_BASE + 0)
 *
 * Report the version of the VFIO API.  This allows us to bump the entire
 * API version should we later need to add or change features in incompatible
 * ways.
 * Return: VFIO_API_VERSION
 * Availability: Always
 */
#define VFIO_GET_API_VERSION_IO(VFIO_TYPE, VFIO_BASE + 0)

/**
 * VFIO_CHECK_EXTENSION - _IOW(VFIO_TYPE, VFIO_BASE + 1, __u32)
 *
 * Check whether an extension is supported.
 * Return: 0 if not supported, 1 (or some other positive integer) if supported.
 * Availability: Always
 */
#define VFIO_CHECK_EXTENSION_IO(VFIO_TYPE, VFIO_BASE + 1)

/**
 * VFIO_SET_IOMMU - _IOW(VFIO_TYPE, VFIO_BASE + 2, __s32)
 *
 * Set the iommu to the given type.  The type must be supported by an
 * iommu driver as verified by calling CHECK_EXTENSION using the same
 * type.  A group must be set to this file descriptor before this
 * ioctl is available.  The IOMMU interfaces enabled by this call are
 * specific to the value set.
 * Return: 0 on success, -errno on failure
 * Availability: When VFIO group attached
 */
#define VFIO_SET_IOMMU  _IO(VFIO_TYPE, VFIO_BASE + 2)

/*  IOCTLs for GROUP file descriptors (/dev/vfio/$GROUP)  */

/**
 * VFIO_GROUP_GET_STATUS - _IOR(VFIO_TYPE, VFIO_BASE + 3,
 *  struct vfio_group_status)
 *
 * Retrieve information about the group.  Fills in provided
 * struct vfio_group_info.  Caller sets argsz.

[PATCH v2] sparc64: replace if (cond) BUG() with BUG_ON()

2021-03-16 Thread Jiapeng Chong

Fix the following coccicheck warnings:

./arch/sparc/kernel/traps_64.c:419:2-5: WARNING: Use BUG_ON instead of
if condition followed by BUG.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
Changes in v2:
  - Replace BUG with BUG_ON.

 arch/sparc/kernel/traps_64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index a850dcc..78d04b2 100644
--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -415,8 +415,7 @@ static void spitfire_clean_and_reenable_l1_caches(void)
 {
unsigned long va;
 
-   if (tlb_type != spitfire)
-   BUG();
+   BUG_ON(tlb_type != spitfire);
 
/* Clean 'em. */
for (va =  0; va < (PAGE_SIZE << 1); va += 32) {
-- 
1.8.3.1

[PATCH] ia64: hp: common: A typo fix in the file sba_iommu.c

2021-03-16 Thread Bhaskar Chowdhury




s/minium/minimum/


Signed-off-by: Bhaskar Chowdhury 
---
 arch/ia64/hp/common/sba_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 9148ddbf02e5..3dcb8c35faad 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -828,7 +828,7 @@ mark_clean (void *addr, size_t size)
  * corresponding IO TLB entry. The PCOM (Purge Command Register)
  * is to purge stale entries in the IO TLB when unmapping entries.
  *
- * The PCOM register supports purging of multiple pages, with a minium
+ * The PCOM register supports purging of multiple pages, with a minimum
  * of 1 page and a maximum of 2GB. Hardware requires the address be
  * aligned to the size of the range being purged. The size of the range
  * must be a power of 2. The "Cool perf optimization" in the
--
2.30.2

Re: [PATCH v3 06/15] media: mtk-vcodec: vdec: support stateless H.264 decoding

2021-03-16 Thread Alexandre Courbot

On Tue, Mar 16, 2021 at 12:21 AM Nicolas Dufresne  wrote:
>
> Le lundi 15 mars 2021 à 20:28 +0900, Alexandre Courbot a écrit :
> > Hi Ezequiel,
> >
> > On Thu, Mar 4, 2021 at 6:47 AM Ezequiel Garcia
> >  wrote:
> > >
> > >  Hi Alex,
> > >
> > > Thanks for the patch.
> > >
> > > On Fri, 26 Feb 2021 at 07:06, Alexandre Courbot 
> > > wrote:
> > > >
> > > > From: Yunfei Dong 
> > > >
> > > > Add support for H.264 decoding using the stateless API, as supported by
> > > > MT8183. This support takes advantage of the V4L2 H.264 reference list
> > > > builders.
> > > >
> > > > Signed-off-by: Yunfei Dong 
> > > > [acourbot: refactor, cleanup and split]
> > > > Co-developed-by: Alexandre Courbot 
> > > > Signed-off-by: Alexandre Courbot 
> > > > ---
> > > >  drivers/media/platform/Kconfig|   1 +
> > > >  drivers/media/platform/mtk-vcodec/Makefile|   1 +
> > > >  .../mtk-vcodec/vdec/vdec_h264_req_if.c| 807 ++
> > > >  .../media/platform/mtk-vcodec/vdec_drv_if.c   |   3 +
> > > >  .../media/platform/mtk-vcodec/vdec_drv_if.h   |   1 +
> > > >  5 files changed, 813 insertions(+)
> > > >  create mode 100644 drivers/media/platform/mtk-
> > > > vcodec/vdec/vdec_h264_req_if.c
> > > >
> > > > diff --git a/drivers/media/platform/Kconfig
> > > > b/drivers/media/platform/Kconfig
> > > > index fd1831e97b22..c27db5643712 100644
> > > > --- a/drivers/media/platform/Kconfig
> > > > +++ b/drivers/media/platform/Kconfig
> > > > @@ -295,6 +295,7 @@ config VIDEO_MEDIATEK_VCODEC
> > > > select V4L2_MEM2MEM_DEV
> > > > select VIDEO_MEDIATEK_VCODEC_VPU if VIDEO_MEDIATEK_VPU
> > > > select VIDEO_MEDIATEK_VCODEC_SCP if MTK_SCP
> > > > +   select V4L2_H264
> > > > help
> > > >   Mediatek video codec driver provides HW capability to
> > > >   encode and decode in a range of video formats on MT8173
> > > > diff --git a/drivers/media/platform/mtk-vcodec/Makefile
> > > > b/drivers/media/platform/mtk-vcodec/Makefile
> > > > index 4ba93d838ab6..ca8e9e7a9c4e 100644
> > > > --- a/drivers/media/platform/mtk-vcodec/Makefile
> > > > +++ b/drivers/media/platform/mtk-vcodec/Makefile
> > > > @@ -7,6 +7,7 @@ obj-$(CONFIG_VIDEO_MEDIATEK_VCODEC) += mtk-vcodec-dec.o 
> > > > \
> > > >  mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
> > > > vdec/vdec_vp8_if.o \
> > > > vdec/vdec_vp9_if.o \
> > > > +   vdec/vdec_h264_req_if.o \
> > > > mtk_vcodec_dec_drv.o \
> > > > vdec_drv_if.o \
> > > > vdec_vpu_if.o \
> > > > diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> > > > b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> > > > new file mode 100644
> > > > index ..2fbbfbbcfbec
> > > > --- /dev/null
> > > > +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> > > > @@ -0,0 +1,807 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#include "../vdec_drv_if.h"
> > > > +#include "../mtk_vcodec_util.h"
> > > > +#include "../mtk_vcodec_dec.h"
> > > > +#include "../mtk_vcodec_intr.h"
> > > > +#include "../vdec_vpu_if.h"
> > > > +#include "../vdec_drv_base.h"
> > > > +
> > > > +#define NAL_NON_IDR_SLICE  0x01
> > > > +#define NAL_IDR_SLICE  0x05
> > > > +#define NAL_H264_PPS   0x08
> > >
> > > Not used?
> > >
> > > > +#define NAL_TYPE(value)((value) & 0x1F)
> > > > +
> > >
> > > I believe you may not need the NAL type.
> >
> > True, removed this block of defines.
> >
> > >
> > > > +#define BUF_PREDICTION_SZ  (64 * 4096)
> > > > +#define MB_UNIT_LEN16
> > > > +
> > > > +/* get used parameters for sps/pps */
> > > > +#define GET_MTK_VDEC_FLAG(cond, flag) \
> > > > +   { dst_param->cond = ((src_param->flags & flag) ? (1) : (0)); }
> > > > +#define GET_MTK_VDEC_PARAM(param) \
> > > > +   { dst_param->param = src_param->param; }
> > > > +/* motion vector size (bytes) for every macro block */
> > > > +#define HW_MB_STORE_SZ 64
> > > > +
> > > > +#define H264_MAX_FB_NUM17
> > > > +#define H264_MAX_MV_NUM32
> > > > +#define HDR_PARSING_BUF_SZ 1024
> > > > +
> > > > +/**
> > > > + * struct mtk_h264_dpb_info  - h264 dpb information
> > > > + * @y_dma_addr: Y bitstream physical address
> > > > + * @c_dma_addr: CbCr bitstream physical address
> > > > + * @reference_flag: reference picture flag (short/long term reference
> > > > picture)
> > > > + * @field: field picture flag
> > > > + */
> > > > +struct mtk_h264_dpb_info {
> > > > +   dma_addr_t y_dma_addr;
> > > > +   dma_addr_t c_dma_addr;
> > > > +   int reference_flag;
> > > > +   int field;
> > > >

Re: [PATCH v3 06/15] media: mtk-vcodec: vdec: support stateless H.264 decoding

2021-03-16 Thread Alexandre Courbot

 On Tue, Mar 16, 2021 at 7:08 AM Ezequiel Garcia
 wrote:
>
> Hi Alex,
>
> On Mon, 15 Mar 2021 at 08:28, Alexandre Courbot  wrote:
> >
> > Hi Ezequiel,
> >
> > On Thu, Mar 4, 2021 at 6:47 AM Ezequiel Garcia
> >  wrote:
> > >
> > >  Hi Alex,
> > >
> > > Thanks for the patch.
> > >
> > > On Fri, 26 Feb 2021 at 07:06, Alexandre Courbot  
> > > wrote:
> > > >
> > > > From: Yunfei Dong 
> > > >
> > > > Add support for H.264 decoding using the stateless API, as supported by
> > > > MT8183. This support takes advantage of the V4L2 H.264 reference list
> > > > builders.
> > > >
> > > > Signed-off-by: Yunfei Dong 
> > > > [acourbot: refactor, cleanup and split]
> > > > Co-developed-by: Alexandre Courbot 
> > > > Signed-off-by: Alexandre Courbot 
> > > > ---
> > > >  drivers/media/platform/Kconfig|   1 +
> > > >  drivers/media/platform/mtk-vcodec/Makefile|   1 +
> > > >  .../mtk-vcodec/vdec/vdec_h264_req_if.c| 807 ++
> > > >  .../media/platform/mtk-vcodec/vdec_drv_if.c   |   3 +
> > > >  .../media/platform/mtk-vcodec/vdec_drv_if.h   |   1 +
> > > >  5 files changed, 813 insertions(+)
> > > >  create mode 100644 
> > > > drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> > > >
> > > > diff --git a/drivers/media/platform/Kconfig 
> > > > b/drivers/media/platform/Kconfig
> > > > index fd1831e97b22..c27db5643712 100644
> > > > --- a/drivers/media/platform/Kconfig
> > > > +++ b/drivers/media/platform/Kconfig
> > > > @@ -295,6 +295,7 @@ config VIDEO_MEDIATEK_VCODEC
> > > > select V4L2_MEM2MEM_DEV
> > > > select VIDEO_MEDIATEK_VCODEC_VPU if VIDEO_MEDIATEK_VPU
> > > > select VIDEO_MEDIATEK_VCODEC_SCP if MTK_SCP
> > > > +   select V4L2_H264
> > > > help
> > > >   Mediatek video codec driver provides HW capability to
> > > >   encode and decode in a range of video formats on MT8173
> > > > diff --git a/drivers/media/platform/mtk-vcodec/Makefile 
> > > > b/drivers/media/platform/mtk-vcodec/Makefile
> > > > index 4ba93d838ab6..ca8e9e7a9c4e 100644
> > > > --- a/drivers/media/platform/mtk-vcodec/Makefile
> > > > +++ b/drivers/media/platform/mtk-vcodec/Makefile
> > > > @@ -7,6 +7,7 @@ obj-$(CONFIG_VIDEO_MEDIATEK_VCODEC) += mtk-vcodec-dec.o 
> > > > \
> > > >  mtk-vcodec-dec-y := vdec/vdec_h264_if.o \
> > > > vdec/vdec_vp8_if.o \
> > > > vdec/vdec_vp9_if.o \
> > > > +   vdec/vdec_h264_req_if.o \
> > > > mtk_vcodec_dec_drv.o \
> > > > vdec_drv_if.o \
> > > > vdec_vpu_if.o \
> > > > diff --git a/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c 
> > > > b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> > > > new file mode 100644
> > > > index ..2fbbfbbcfbec
> > > > --- /dev/null
> > > > +++ b/drivers/media/platform/mtk-vcodec/vdec/vdec_h264_req_if.c
> > > > @@ -0,0 +1,807 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#include "../vdec_drv_if.h"
> > > > +#include "../mtk_vcodec_util.h"
> > > > +#include "../mtk_vcodec_dec.h"
> > > > +#include "../mtk_vcodec_intr.h"
> > > > +#include "../vdec_vpu_if.h"
> > > > +#include "../vdec_drv_base.h"
> > > > +
> > > > +#define NAL_NON_IDR_SLICE  0x01
> > > > +#define NAL_IDR_SLICE  0x05
> > > > +#define NAL_H264_PPS   0x08
> > >
> > > Not used?
> > >
> > > > +#define NAL_TYPE(value)((value) & 0x1F)
> > > > +
> > >
> > > I believe you may not need the NAL type.
> >
> > True, removed this block of defines.
> >
> > >
> > > > +#define BUF_PREDICTION_SZ  (64 * 4096)
> > > > +#define MB_UNIT_LEN16
> > > > +
> > > > +/* get used parameters for sps/pps */
> > > > +#define GET_MTK_VDEC_FLAG(cond, flag) \
> > > > +   { dst_param->cond = ((src_param->flags & flag) ? (1) : (0)); }
> > > > +#define GET_MTK_VDEC_PARAM(param) \
> > > > +   { dst_param->param = src_param->param; }
> > > > +/* motion vector size (bytes) for every macro block */
> > > > +#define HW_MB_STORE_SZ 64
> > > > +
> > > > +#define H264_MAX_FB_NUM17
> > > > +#define H264_MAX_MV_NUM32
> > > > +#define HDR_PARSING_BUF_SZ 1024
> > > > +
> > > > +/**
> > > > + * struct mtk_h264_dpb_info  - h264 dpb information
> > > > + * @y_dma_addr: Y bitstream physical address
> > > > + * @c_dma_addr: CbCr bitstream physical address
> > > > + * @reference_flag: reference picture flag (short/long term reference 
> > > > picture)
> > > > + * @field: field picture flag
> > > > + */
> > > > +struct mtk_h264_dpb_info {
> > > > +   dma_addr_t y_dma_addr;
> > > > +   dma_addr_t c_dma_addr;
> > > > +   int reference_flag;
> > > > +   int

Re: [PATCH v3 05/15] media: mtk-vcodec: vdec: support stateless API

2021-03-16 Thread Alexandre Courbot

On Tue, Mar 16, 2021 at 6:45 AM Ezequiel Garcia
 wrote:
>
> Hi Alexandre,
>
> On Mon, 15 Mar 2021 at 08:28, Alexandre Courbot  wrote:
> >
> > Hi Ezequiel, thanks for the feedback!
> >
> > On Thu, Mar 4, 2021 at 6:30 AM Ezequiel Garcia
> >  wrote:
> > >
> > > Hello Alex,
> > >
> > > Thanks for the patch.
> > >
> > > On Fri, 26 Feb 2021 at 07:06, Alexandre Courbot  
> > > wrote:
> > > >
> > > > From: Yunfei Dong 
> > > >
> > > > Support the stateless codec API that will be used by MT8183.
> > > >
> > > > Signed-off-by: Yunfei Dong 
> > > > [acourbot: refactor, cleanup and split]
> > > > Co-developed-by: Alexandre Courbot 
> > > > Signed-off-by: Alexandre Courbot 
> > > > ---
> > > >  drivers/media/platform/mtk-vcodec/Makefile|   1 +
> > > >  .../platform/mtk-vcodec/mtk_vcodec_dec.c  |  66 ++-
> > > >  .../platform/mtk-vcodec/mtk_vcodec_dec.h  |   9 +-
> > > >  .../mtk-vcodec/mtk_vcodec_dec_stateless.c | 427 ++
> > > >  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |   3 +
> > > >  5 files changed, 503 insertions(+), 3 deletions(-)
> > > >  create mode 100644 
> > > > drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> > > >
> > > [..]
> > >
> > > > +
> > > > +static const struct mtk_stateless_control mtk_stateless_controls[] = {
> > > > +   {
> > > > +   .cfg = {
> > > > +   .id = V4L2_CID_STATELESS_H264_SPS,
> > > > +   },
> > > > +   .codec_type = V4L2_PIX_FMT_H264_SLICE,
> > > > +   .needed_in_request = true,
> > >
> > > This "needed_in_request" is not really required, as controls
> > > are not volatile, and their value is stored per-context (per-fd).
> > >
> > > It's perfectly valid for an application to pass the SPS control
> > > at the beginning of the sequence, and then omit it
> > > in further requests.
> >
> > If I understand how v4l2_ctrl_request_hdl_ctrl_find() works with
> > requests, this boolean only checks that the control has been provided
> > at least once, and not that it is provided with every request. Without
> > it we could send a frame to the firmware without e.g. setting an SPS,
> > which would be a problem.
> >
>
> As Nicolas points out, in V4L2 controls have an initial value,
> so no control can be unset.

I see. So I guess the expectation is that failure will occur later as
the firmware reports it cannot decode properly (or returns a corrupted
frame). Thanks for the precision.

>
> > >
> > > > +   },
> > > > +   {
> > > > +   .cfg = {
> > > > +   .id = V4L2_CID_STATELESS_H264_PPS,
> > > > +   },
> > > > +   .codec_type = V4L2_PIX_FMT_H264_SLICE,
> > > > +   .needed_in_request = true,
> > > > +   },
> > > > +   {
> > > > +   .cfg = {
> > > > +   .id = V4L2_CID_STATELESS_H264_SCALING_MATRIX,
> > > > +   },
> > > > +   .codec_type = V4L2_PIX_FMT_H264_SLICE,
> > > > +   .needed_in_request = true,
> > > > +   },
> > > > +   {
> > > > +   .cfg = {
> > > > +   .id = V4L2_CID_STATELESS_H264_DECODE_PARAMS,
> > > > +   },
> > > > +   .codec_type = V4L2_PIX_FMT_H264_SLICE,
> > > > +   .needed_in_request = true,
> > > > +   },
> > > > +   {
> > > > +   .cfg = {
> > > > +   .id = V4L2_CID_MPEG_VIDEO_H264_PROFILE,
> > > > +   .def = V4L2_MPEG_VIDEO_H264_PROFILE_MAIN,
> > > > +   .max = V4L2_MPEG_VIDEO_H264_PROFILE_HIGH,
> > > > +   .menu_skip_mask =
> > > > +   
> > > > BIT(V4L2_MPEG_VIDEO_H264_PROFILE_BASELINE) |
> > > > +   
> > > > BIT(V4L2_MPEG_VIDEO_H264_PROFILE_EXTENDED),
> > > > +   },
> > > > +   .codec_type = V4L2_PIX_FMT_H264_SLICE,
> > > > +   },
> > > > +   {
> > > > +   .cfg = {
> > > > +   .id = V4L2_CID_STATELESS_H264_DECODE_MODE,
> > > > +   .min = 
> > > > V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
> > > > +   .def = 
> > > > V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
> > > > +   .max = 
> > > > V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED,
> > > > +   },
> > > > +   .codec_type = V4L2_PIX_FMT_H264_SLICE,
> > > > +   },
> > > > +};
> > >
> > > Applications also need to know which V4L2_CID_STATELESS_H264_START_CODE
> > > the driver supports. From a next patch, this case seems to be
> > > V4L2_STATELESS_H264_START_CODE_ANNEX_B.
> >
> > Indeed - I've added the control, thanks for catching this!
> >
> > >
> > > > +#define NUM_CTRLS ARRAY_SIZE(mtk_stateless_controls)
> > > > +
> > > > +static const struct mtk_video_fmt mtk_video_formats[] = {
> > > > +   {
> > > > +   .fourcc = V4L2_PIX_FMT_H264_SLICE,
> > > > +

Re: [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log

2021-03-16 Thread Yi Sun

On 21-03-16 19:39:47, Keqian Zhu wrote:
> Hi Yi,
> 
> On 2021/3/16 17:17, Yi Sun wrote:
> > On 21-03-10 17:06:07, Keqian Zhu wrote:
> >> From: jiangkunkun 
> >>
> >> Block descriptor is not a proper granule for dirty log tracking.
> >> Take an extreme example, if DMA writes one byte, under 1G mapping,
> >> the dirty amount reported to userspace is 1G, but under 4K mapping,
> >> the dirty amount is just 4K.
> >>
> >> This adds a new interface named start_dirty_log in iommu layer and
> >> arm smmuv3 implements it, which splits block descriptor to an span
> >> of page descriptors. Other types of IOMMU will perform architecture
> >> specific actions to start dirty log.
> >>
> >> To allow code reuse, the split_block operation is realized as an
> >> iommu_ops too. We flush all iotlbs after the whole procedure is
> >> completed to ease the pressure of iommu, as we will hanle a huge
> >> range of mapping in general.
> >>
> >> Spliting block does not simultaneously work with other pgtable ops,
> >> as the only designed user is vfio, which always hold a lock, so race
> >> condition is not considered in the pgtable ops.
> >>
> >> Co-developed-by: Keqian Zhu 
> >> Signed-off-by: Kunkun Jiang 
> >> ---
> >>
> >> changelog:
> >>
> >> v2:
> >>  - Change the return type of split_block(). size_t -> int.
> >>  - Change commit message to properly describe race condition. (Robin)
> >>  - Change commit message to properly describe the need of split block.
> >>  - Add a new interface named start_dirty_log(). (Sun Yi)
> >>  - Change commit message to explain the realtionship of split_block() and 
> >> start_dirty_log().
> >>
> >> ---
> >>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  52 +
> >>  drivers/iommu/io-pgtable-arm.c  | 122 
> >>  drivers/iommu/iommu.c   |  48 
> >>  include/linux/io-pgtable.h  |   2 +
> >>  include/linux/iommu.h   |  24 
> >>  5 files changed, 248 insertions(+)
> >>
> > Could you please split iommu common interface to a separate patch?
> > This may make review and comments easier.
> Yup, good suggestion.
> 
> > 
> > IMHO, I think the start/stop interfaces could be merged into one, e.g:
> > int iommu_domain_set_hwdbm(struct iommu_domain *domain, bool enable,
> >unsigned long iova, size_t size,
> >int prot);
> Looks good, this reduces some code. but I have a concern that this causes 
> loss of flexibility,
> as we must pass same arguments when start|stop dirty log. What's your opinion 
> about this?
> 
Per current design, start/stop interfaces have similar arguments. So I
think it is ok for now. For future extension, we may think to define a
structure to pass these arguments.

> > 
> > Same comments to patch 5.
> OK. Thanks.
> 
> > 
> > BRs,
> > Yi Sun
> > 
> >> -- 
> >> 2.19.1
> > .
> Thanks,
> Keqian

BRs,
Yi Sun

[PATCH v3] bpf: Fix memory leak in copy_process()

2021-03-16 Thread qiang . zhang

From: Zqiang 

The syzbot report a memleak follow:
BUG: memory leak
unreferenced object 0x888101b41d00 (size 120):
  comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
  backtrace:
[] alloc_pid+0x66/0x560
[] copy_process+0x1465/0x25e0
[] kernel_clone+0xf3/0x670
[] kernel_thread+0x61/0x80
[] call_usermodehelper_exec_work
[] call_usermodehelper_exec_work+0xc4/0x120
[] process_one_work+0x2c9/0x600
[] worker_thread+0x59/0x5d0
[] kthread+0x178/0x1b0
[] ret_from_fork+0x1f/0x30

unreferenced object 0x888110ef5c00 (size 232):
  comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
  backtrace:
[] kmem_cache_zalloc
[] __alloc_file+0x1f/0xf0
[] alloc_empty_file+0x69/0x120
[] alloc_file+0x33/0x1b0
[] alloc_file_pseudo+0xb2/0x140
[] create_pipe_files+0x138/0x2e0
[] umd_setup+0x33/0x220
[] call_usermodehelper_exec_async+0xb4/0x1b0
[] ret_from_fork+0x1f/0x30

after the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid
need to be release.

Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that 
populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b...@syzkaller.appspotmail.com
Signed-off-by: Zqiang 
---
 v1->v2:
 Judge whether the pointer variable tgid is valid.
 v2->v3:
 Add common umd_cleanup_helper() and exported as
 symbol which the driver here can use.

 include/linux/usermode_driver.h   |  1 +
 kernel/bpf/preload/bpf_preload_kern.c | 15 +++
 kernel/usermode_driver.c  | 18 ++
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h
index 073a9e0ec07d..ad970416260d 100644
--- a/include/linux/usermode_driver.h
+++ b/include/linux/usermode_driver.h
@@ -14,5 +14,6 @@ struct umd_info {
 int umd_load_blob(struct umd_info *info, const void *data, size_t len);
 int umd_unload_blob(struct umd_info *info);
 int fork_usermode_driver(struct umd_info *info);
+void umd_cleanup_helper(struct umd_info *info);
 
 #endif /* __LINUX_USERMODE_DRIVER_H__ */
diff --git a/kernel/bpf/preload/bpf_preload_kern.c 
b/kernel/bpf/preload/bpf_preload_kern.c
index 79c5772465f1..356c4ca4f530 100644
--- a/kernel/bpf/preload/bpf_preload_kern.c
+++ b/kernel/bpf/preload/bpf_preload_kern.c
@@ -61,8 +61,10 @@ static int finish(void)
if (n != sizeof(magic))
return -EPIPE;
tgid = umd_ops.info.tgid;
-   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
-   umd_ops.info.tgid = NULL;
+   if (tgid) {
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   umd_cleanup_helper(_ops.info);
+   }
return 0;
 }
 
@@ -80,10 +82,15 @@ static int __init load_umd(void)
 
 static void __exit fini_umd(void)
 {
+   struct pid *tgid;
bpf_preload_ops = NULL;
/* kill UMD in case it's still there due to earlier error */
-   kill_pid(umd_ops.info.tgid, SIGKILL, 1);
-   umd_ops.info.tgid = NULL;
+   tgid = umd_ops.info.tgid;
+   if (tgid) {
+   kill_pid(tgid, SIGKILL, 1);
+   wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
+   umd_cleanup_helper(_ops.info);
+   }
umd_unload_blob(_ops.info);
 }
 late_initcall(load_umd);
diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c
index 0b35212ffc3d..6372deae27a0 100644
--- a/kernel/usermode_driver.c
+++ b/kernel/usermode_driver.c
@@ -140,13 +140,23 @@ static void umd_cleanup(struct subprocess_info *info)
 
/* cleanup if umh_setup() was successful but exec failed */
if (info->retval) {
-   fput(umd_info->pipe_to_umh);
-   fput(umd_info->pipe_from_umh);
-   put_pid(umd_info->tgid);
-   umd_info->tgid = NULL;
+   umd_cleanup_helper(umd_info);
}
 }
 
+/**
+ * umd_cleanup_helper - release the resources which allocated in umd_setup
+ * @info: information about usermode driver
+ */
+void umd_cleanup_helper(struct umd_info *info)
+{
+   fput(info->pipe_to_umh);
+   fput(info->pipe_from_umh);
+   put_pid(info->tgid);
+   info->tgid = NULL;
+}
+EXPORT_SYMBOL_GPL(umd_cleanup_helper);
+
 /**
  * fork_usermode_driver - fork a usermode driver
  * @info: information about usermode driver (shouldn't be NULL)
-- 
2.17.1

linux-next: manual merge of the drm-intel tree with the drm tree

2021-03-16 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the drm-intel tree got a conflict in:

  drivers/gpu/drm/i915/display/intel_sprite.c

between commit:

  92f1d09ca4ed ("drm: Switch to %p4cc format modifier")

from the drm tree and commit:

  46d12f911821 ("drm/i915: migrate skl planes code new file (v5)")

from the drm-intel tree.

I fixed it up (I used the latter version of the file and applied the
following patch) and can carry the fix as necessary. This is now fixed
as far as linux-next is concerned, but any non trivial conflicts should
be mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

From: Stephen Rothwell 
Date: Wed, 17 Mar 2021 14:05:42 +1100
Subject: [PATCH] merge fix for "drm: Switch to %p4cc format modifier"

Signed-off-by: Stephen Rothwell 
---
 drivers/gpu/drm/i915/display/skl_universal_plane.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c 
b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 1f335cb09149..45ceff436bf7 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -1120,7 +1120,6 @@ static int skl_plane_check_fb(const struct 
intel_crtc_state *crtc_state,
struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
const struct drm_framebuffer *fb = plane_state->hw.fb;
unsigned int rotation = plane_state->hw.rotation;
-   struct drm_format_name_buf format_name;
 
if (!fb)
return 0;
@@ -1168,9 +1167,8 @@ static int skl_plane_check_fb(const struct 
intel_crtc_state *crtc_state,
case DRM_FORMAT_XVYU12_16161616:
case DRM_FORMAT_XVYU16161616:
drm_dbg_kms(_priv->drm,
-   "Unsupported pixel format %s for 90/270!\n",
-   drm_get_format_name(fb->format->format,
-   _name));
+   "Unsupported pixel format %p4cc for 
90/270!\n",
+   >format->format);
return -EINVAL;
default:
break;
-- 
2.30.0

-- 
Cheers,
Stephen Rothwell


pgpKKvm5hUZ6h.pgp
Description: OpenPGP digital signature

[PATCH] media/pci: Assign value when defining variables

2021-03-16 Thread zuoqilin1

From: zuoqilin 

When defining variables and assigning values can be done at the same time.

Signed-off-by: zuoqilin 
---
 drivers/media/pci/pt1/pt1.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/media/pci/pt1/pt1.c b/drivers/media/pci/pt1/pt1.c
index 72b191c..f2aa368 100644
--- a/drivers/media/pci/pt1/pt1.c
+++ b/drivers/media/pci/pt1/pt1.c
@@ -334,8 +334,7 @@ static int pt1_sync(struct pt1 *pt1)
 static u64 pt1_identify(struct pt1 *pt1)
 {
int i;
-   u64 id;
-   id = 0;
+   u64 id = 0;
for (i = 0; i < 57; i++) {
id |= (u64)(pt1_read_reg(pt1, 0) >> 30 & 1) << i;
pt1_write_reg(pt1, 0, 0x0008);
@@ -1122,8 +1121,7 @@ static int pt1_i2c_end(struct pt1 *pt1, int addr)
 
 static void pt1_i2c_begin(struct pt1 *pt1, int *addrp)
 {
-   int addr;
-   addr = 0;
+   int addr = 0;
 
pt1_i2c_emit(pt1, addr, 0, 0, 1, 1, addr /* itself */);
addr = addr + 1;
-- 
1.9.1

[PATCH] xen/evtchn: replace if (cond) BUG() with BUG_ON()

2021-03-16 Thread Jiapeng Chong

Fix the following coccicheck warnings:

./drivers/xen/evtchn.c:412:2-5: WARNING: Use BUG_ON instead of if
condition followed by BUG.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 drivers/xen/evtchn.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index c99415a..b1c59bc 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -408,8 +408,7 @@ static int evtchn_bind_to_user(struct per_user_data *u, 
evtchn_port_t port)
 err:
/* bind failed, should close the port now */
close.port = port;
-   if (HYPERVISOR_event_channel_op(EVTCHNOP_close, ) != 0)
-   BUG();
+   BUG_ON(HYPERVISOR_event_channel_op(EVTCHNOP_close, ) != 0);
del_evtchn(u, evtchn);
return rc;
 }
-- 
1.8.3.1

Re: [PATCH] powerpc: arch/powerpc/kernel/setup_64.c - cleanup warnings

2021-03-16 Thread Daniel Axtens

Hi He Ying,

Thank you for this patch.

I'm not sure what the precise rules for Fixes are, but I wonder if this
should have:

Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses")
Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry")

Those are the commits that added the entry_flush and uaccess_flush
symbols. Perhaps one for rfi_flush too but I'm not sure what commit
introduced that.

Kind regards,
Daniel

> warning: symbol 'rfi_flush' was not declared.
> warning: symbol 'entry_flush' was not declared.
> warning: symbol 'uaccess_flush' was not declared.
> We found warnings above in arch/powerpc/kernel/setup_64.c by using
> sparse tool.
>
> Define 'entry_flush' and 'uaccess_flush' as static because they are not
> referenced outside the file. Include asm/security_features.h in which
> 'rfi_flush' is declared.
>
> Reported-by: Hulk Robot 
> Signed-off-by: He Ying 
> ---
>  arch/powerpc/kernel/setup_64.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 560ed8b975e7..f92d72a7e7ce 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -68,6 +68,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "setup.h"
>  
> @@ -949,8 +950,8 @@ static bool no_rfi_flush;
>  static bool no_entry_flush;
>  static bool no_uaccess_flush;
>  bool rfi_flush;
> -bool entry_flush;
> -bool uaccess_flush;
> +static bool entry_flush;
> +static bool uaccess_flush;
>  DEFINE_STATIC_KEY_FALSE(uaccess_flush_key);
>  EXPORT_SYMBOL(uaccess_flush_key);
>  
> -- 
> 2.17.1

Re: [PATCH v7 2/3] block: add bdev_interposer

2021-03-16 Thread Ming Lei

On Tue, Mar 16, 2021 at 07:35:44PM +0300, Sergei Shtepa wrote:
> The 03/16/2021 11:09, Ming Lei wrote:
> > On Fri, Mar 12, 2021 at 06:44:54PM +0300, Sergei Shtepa wrote:
> > > bdev_interposer allows to redirect bio requests to another devices.
> > > 
> > > Signed-off-by: Sergei Shtepa 
> > > ---
> > >  block/bio.c   |  2 ++
> > >  block/blk-core.c  | 57 +++
> > >  block/genhd.c | 54 +
> > >  include/linux/blk_types.h |  3 +++
> > >  include/linux/blkdev.h|  9 +++
> > >  5 files changed, 125 insertions(+)
> > > 
> > > diff --git a/block/bio.c b/block/bio.c
> > > index a1c4d2900c7a..0bfbf06475ee 100644
> > > --- a/block/bio.c
> > > +++ b/block/bio.c
> > > @@ -640,6 +640,8 @@ void __bio_clone_fast(struct bio *bio, struct bio 
> > > *bio_src)
> > >   bio_set_flag(bio, BIO_THROTTLED);
> > >   if (bio_flagged(bio_src, BIO_REMAPPED))
> > >   bio_set_flag(bio, BIO_REMAPPED);
> > > + if (bio_flagged(bio_src, BIO_INTERPOSED))
> > > + bio_set_flag(bio, BIO_INTERPOSED);
> > >   bio->bi_opf = bio_src->bi_opf;
> > >   bio->bi_ioprio = bio_src->bi_ioprio;
> > >   bio->bi_write_hint = bio_src->bi_write_hint;
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index fc60ff208497..da1abc4c27a9 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -1018,6 +1018,55 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio 
> > > *bio)
> > >   return ret;
> > >  }
> > >  
> > > +static noinline blk_qc_t submit_bio_interposed(struct bio *bio)
> > > +{
> > > + blk_qc_t ret = BLK_QC_T_NONE;
> > > + struct bio_list bio_list[2] = { };
> > > + struct gendisk *orig_disk;
> > > +
> > > + if (current->bio_list) {
> > > + bio_list_add(>bio_list[0], bio);
> > > + return BLK_QC_T_NONE;
> > > + }
> > > +
> > > + orig_disk = bio->bi_bdev->bd_disk;
> > > + if (unlikely(bio_queue_enter(bio)))
> > > + return BLK_QC_T_NONE;
> > > +
> > > + current->bio_list = bio_list;
> > > +
> > > + do {
> > > + struct block_device *interposer = bio->bi_bdev->bd_interposer;
> > > +
> > > + if (unlikely(!interposer)) {
> > > + /* interposer was removed */
> > > + bio_list_add(>bio_list[0], bio);
> > > + break;
> > > + }
> > > + /* assign bio to interposer device */
> > > + bio_set_dev(bio, interposer);
> > > + bio_set_flag(bio, BIO_INTERPOSED);
> > > +
> > > + if (!submit_bio_checks(bio))
> > > + break;
> > > + /*
> > > +  * Because the current->bio_list is initialized,
> > > +  * the submit_bio callback will always return BLK_QC_T_NONE.
> > > +  */
> > > + interposer->bd_disk->fops->submit_bio(bio);
> > 
> > Given original request queue may become live when calling attach() and
> > detach(), see below comment. bdev_interposer_detach() may be run
> > when running ->submit_bio(), meantime the interposer device is
> > gone during the period, then kernel oops.
> 
> I think that since the bio_queue_enter() function was called,
> q->q_usage_counter will not allow the critical code in the attach/detach
> functions to be executed, which is located between the blk_freeze_queue
> and blk_unfreeze_queue calls.
> Please correct me if I'm wrong.
> 
> > 
> > > + } while (false);
> > > +
> > > + current->bio_list = NULL;
> > > +
> > > + blk_queue_exit(orig_disk->queue);
> > > +
> > > + /* Resubmit remaining bios */
> > > + while ((bio = bio_list_pop(_list[0])))
> > > + ret = submit_bio_noacct(bio);
> > > +
> > > + return ret;
> > > +}
> > > +
> > >  /**
> > >   * submit_bio_noacct - re-submit a bio to the block device layer for I/O
> > >   * @bio:  The bio describing the location in memory and on the device.
> > > @@ -1029,6 +1078,14 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio 
> > > *bio)
> > >   */
> > >  blk_qc_t submit_bio_noacct(struct bio *bio)
> > >  {
> > > + /*
> > > +  * Checking the BIO_INTERPOSED flag is necessary so that the bio
> > > +  * created by the bdev_interposer do not get to it for processing.
> > > +  */
> > > + if (bdev_has_interposer(bio->bi_bdev) &&
> > > + !bio_flagged(bio, BIO_INTERPOSED))
> > > + return submit_bio_interposed(bio);
> > > +
> > >   if (!submit_bio_checks(bio))
> > >   return BLK_QC_T_NONE;
> > >  
> > > diff --git a/block/genhd.c b/block/genhd.c
> > > index c55e8f0fced1..c840ecffea68 100644
> > > --- a/block/genhd.c
> > > +++ b/block/genhd.c
> > > @@ -30,6 +30,11 @@
> > >  static struct kobject *block_depr;
> > >  
> > >  DECLARE_RWSEM(bdev_lookup_sem);
> > > +/*
> > > + * Prevents different block-layer interposers from attaching or detaching
> > > + * to the block device at the same time.
> > > + */
> > > +static DEFINE_MUTEX(bdev_interposer_attach_lock);
> > >  
> > >  /* for extended dynamic devt allocation, currently only one major is 
> > > used */
>

Re: [PATCH 5.11 000/306] 5.11.7-rc1 review

2021-03-16 Thread Ross Schmidt

On Mon, Mar 15, 2021 at 02:51:03PM +0100, gre...@linuxfoundation.org wrote:
> From: Greg Kroah-Hartman 
> 
> This is the start of the stable review cycle for the 5.11.7 release.
> There are 306 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>

Compiled and booted with no regressions on x86_64.

Tested-by: Ross Schmidt 


thanks,

Ross

Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

2021-03-16 Thread Davidlohr Bueso


On Tue, 16 Mar 2021, Waiman Long wrote:


It was found that running the ww_mutex_lock-torture test produced the
following lockdep splat almost immediately:

[  103.892638] ==
[  103.892639] WARNING: possible circular locking dependency detected
[  103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S  W
[  103.892643] --
[  103.892643] lock_torture_wr/3234 is trying to acquire lock:
[  103.892646] c0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: 
torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892660]
[  103.892660] but task is already holding lock:
[  103.892661] c0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: 
torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
[  103.892669]
[  103.892669] which lock already depends on the new lock.
[  103.892669]
[  103.892670]
[  103.892670] the existing dependency chain (in reverse order) is:
[  103.892671]
[  103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
[  103.892675]lock_acquire+0x1c5/0x830
[  103.892682]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892687]ww_mutex_lock+0x4b/0x180
[  103.892690]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892694]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892698]kthread+0x35f/0x430
[  103.892701]ret_from_fork+0x1f/0x30
[  103.892706]
[  103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
[  103.892709]lock_acquire+0x1c5/0x830
[  103.892712]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892715]ww_mutex_lock+0x4b/0x180
[  103.892717]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892721]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892725]kthread+0x35f/0x430
[  103.892727]ret_from_fork+0x1f/0x30
[  103.892730]
[  103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
[  103.892733]check_prevs_add+0x3fd/0x2470
[  103.892736]__lock_acquire+0x2602/0x3100
[  103.892738]lock_acquire+0x1c5/0x830
[  103.892740]__ww_mutex_lock.constprop.15+0x1d1/0x2e50
[  103.892743]ww_mutex_lock+0x4b/0x180
[  103.892746]torture_ww_mutex_lock+0x316/0x720 [locktorture]
[  103.892749]lock_torture_writer+0x142/0x3a0 [locktorture]
[  103.892753]kthread+0x35f/0x430
[  103.892755]ret_from_fork+0x1f/0x30
[  103.892757]
[  103.892757] other info that might help us debug this:
[  103.892757]
[  103.892758] Chain exists of:
[  103.892758]   torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> 
torture_ww_mutex_0.base
[  103.892758]
[  103.892763]  Possible unsafe locking scenario:
[  103.892763]
[  103.892764]CPU0CPU1
[  103.892765]
[  103.892765]   lock(torture_ww_mutex_0.base);
[  103.892767]lock(torture_ww_mutex_1.base);
[  103.892770]lock(torture_ww_mutex_0.base);
[  103.892772]   lock(torture_ww_mutex_2.base);
[  103.892774]
[  103.892774]  *** DEADLOCK ***

Since ww_mutex is supposed to be deadlock-proof if used properly, such
deadlock scenario should not happen. To avoid this false positive splat,
treat ww_mutex_lock() like a trylock().

After applying this patch, the locktorture test can run for a long time
without triggering the circular locking dependency splat.

Signed-off-by: Waiman Long 


Acked-by Davidlohr Bueso

Re: [PATCH 5.10 000/290] 5.10.24-rc1 review

2021-03-16 Thread Ross Schmidt

On Mon, Mar 15, 2021 at 02:51:33PM +0100, gre...@linuxfoundation.org wrote:
> From: Greg Kroah-Hartman 
> 
> This is the start of the stable review cycle for the 5.10.24 release.
> There are 290 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>

Compiled and booted with no regressions on x86_64.

Tested-by: Ross Schmidt 


thanks,

Ross

Re: [PATCH] drm: xlnx: call pm_runtime_get_sync before setting pixel clock

2021-03-16 Thread quanyang.wang

Hi Laurent,

On 3/17/21 4:32 AM, Laurent Pinchart wrote:

Hi Quanyang,

Thank you for the patch.

On Wed, Mar 10, 2021 at 12:59:45PM +0800, quanyang.w...@windriver.com wrote:

From: Quanyang Wang 

The Runtime PM subsystem will force the device "fd4a.zynqmp-display"
to enter suspend state while booting if the following conditions are met:
- the usage counter is zero (pm_runtime_get_sync hasn't been called yet)
- no 'active' children (no zynqmp-dp-snd-xx node under dpsub node)
- no other device in the same power domain (dpdma node has no
"power-domains = <_firmware PD_DP>" property)

So there is a scenario as below:
1) DP device enters suspend state   <- call zynqmp_gpd_power_off
2) zynqmp_disp_crtc_setup_clock <- configurate register VPLL_FRAC_CFG
3) pm_runtime_get_sync  <- call zynqmp_gpd_power_on and clear 
previous
   VPLL_FRAC_CFG configuration
4) clk_prepare_enable(disp->pclk)   <- enable failed since VPLL_FRAC_CFG
   configuration is corrupted

 From above, we can see that pm_runtime_get_sync may clear register
VPLL_FRAC_CFG configuration and result the failure of clk enabling.
Putting pm_runtime_get_sync at the very beginning of the function
zynqmp_disp_crtc_atomic_enable can resolve this issue.

Isn't this an issue in the firmware though, which shouldn't clear the
previous VPLLF_FRAC_CFG ?

Thank you for your review.  I used to look into the atf and PWU code and 
it seems (I didn't add debug code

to PMU to make sure if PMU really does this, I only in kernel call 
zynqmp_pm_get_pll_frac_data to make sure that

the value in data field of VPLL_FRAC_CFG register is changed from 0x4000 
to 0x0 after run pm_runtime_get_sync)

that PMU intends to reset VPLL when there is an  Off and On in DP 
Powerdomain.

Linux ATF                     PWU

zynqmp_gpd_power_on
->zynqmp_pm_set_requirement
-->send PM_SET_REQUIREMENT to ATF  ==>ATF send ipi to PWU   ==>   
PmSetRequirement

->PmRequirementUpdate

-->PmUpdateSlave(masterReq->slave)

--->PmSlaveChangeState

>PmSlaveChangeState

->PmSlaveClearAfterState

-->PmClockRelease

--->PmClockReleaseInt(>clock->base)

>clk->class->release(clk)

->PmPllBypassAndReset //Here reset the VPLL then VPLL_FRAC_CFG is 
cleared.

Signed-off-by: Quanyang Wang 

Nonetheless, this change looks good to me, I actually had the same patch
in my tree while investigation issues related to the clock rate, so

Reviewed-by: Laurent Pinchart 
Tested-by: Laurent Pinchart 

I was hoping it would solve the issue I'm experiencing with the DP
clock, but that's not the case :-( In a nutshell, when the DP is first
started, the clock frequency is incorrect. The following quick & dirty
patch fixes the problem:

diff --git a/drivers/gpu/drm/xlnx/zynqmp_disp.c 
b/drivers/gpu/drm/xlnx/zynqmp_disp.c
index 74ac0a064eb5..fdbe1b0640aa 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_disp.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_disp.c
@@ -1439,6 +1439,10 @@ zynqmp_disp_crtc_atomic_enable(struct drm_crtc *crtc,

pm_runtime_get_sync(disp->dev);

+   ret = clk_prepare_enable(disp->pclk);
+   if (!ret)
+   clk_disable_unprepare(disp->pclk);
+
zynqmp_disp_crtc_setup_clock(crtc, adjusted_mode);

ret = clk_prepare_enable(disp->pclk);

The problem doesn't seem to be in the kernel, but on the TF-A or PMU
firmware side. Have you experienced this by any chance ?

Yes,  I bumped into the same issue and I used to make a patch (Patch 1) 
as below.

I didn't send it to mainline because it seems not to be a driver issue. 
The mode of VPLL

is not set correctly because:

1) VPLL is enabled before linux

2) linux calling pm_clock_set_pll_mode can't really set register because 
in ATF

it only store the mode value to a structure and wait a clk-enable 
request to do

the register-set operation.

3) linux call clk_enable will not send a clk-enable request since it 
checks that

the VPLL is already hardware-enabled because of 1).

So the firmware should disable VPLL when it exits and also in linux 
zynqmp clk driver

there should be a check list to reset some clks to a predefined state.

By the way, there is a tiny patch (Patch 2) to fix the black screen 
issue in DP. I think you

Re: [PATCH 1/2] perf/x86/intel: Fix a crash caused by zero PEBS status

2021-03-16 Thread Namhyung Kim

On Fri, Mar 12, 2021 at 05:21:37AM -0800, kan.li...@linux.intel.com wrote:
> From: Kan Liang 
> 
> A repeatable crash can be triggered by the perf_fuzzer on some Haswell
> system.
> https://lore.kernel.org/lkml/7170d3b-c17f-1ded-52aa-cc6d9ae99...@maine.edu/
> 
> For some old CPUs (HSW and earlier), the PEBS status in a PEBS record
> may be mistakenly set to 0. To minimize the impact of the defect, the
> commit was introduced to try to avoid dropping the PEBS record for some
> cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates
> the local pebs_status accordingly. However, it doesn't correct the PEBS
> status in the PEBS record, which may trigger the crash, especially for
> the large PEBS.
> 
> It's possible that all the PEBS records in a large PEBS have the PEBS
> status 0. If so, the first get_next_pebs_record_by_bit() in the
> __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large
> PEBS, the 'count' parameter must > 1. The second
> get_next_pebs_record_by_bit() will crash.
> 
> Besides the local pebs_status, correct the PEBS status in the PEBS
> record as well.
> 
> Fixes: 01330d7288e0 ("perf/x86: Allow zero PEBS status with only single 
> active event")
> Reported-by: Vince Weaver 
> Suggested-by: Peter Zijlstra (Intel) 
> Signed-off-by: Kan Liang 
> Cc: sta...@vger.kernel.org

Tested-by: Namhyung Kim 

Thanks,
Namhyung


> ---
>  arch/x86/events/intel/ds.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 7ebae18..bcf4fa5 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -2010,7 +2010,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs 
> *iregs, struct perf_sample_d
>*/
>   if (!pebs_status && cpuc->pebs_enabled &&
>   !(cpuc->pebs_enabled & (cpuc->pebs_enabled-1)))
> - pebs_status = cpuc->pebs_enabled;
> + pebs_status = p->status = cpuc->pebs_enabled;
>  
>   bit = find_first_bit((unsigned long *)_status,
>   x86_pmu.max_pebs_events);
> -- 
> 2.7.4
>

Re: [PATCH v2] scsi: ufs: sysfs: Print string descriptors as raw data

2021-03-16 Thread Martin K. Petersen



Hi Arthur!

> Could you please consider to take this patch?

The patch needs some reviews. I suggest you repost.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 5.4 000/168] 5.4.106-rc1 review

2021-03-16 Thread Ross Schmidt

On Mon, Mar 15, 2021 at 02:53:52PM +0100, gre...@linuxfoundation.org wrote:
> From: Greg Kroah-Hartman 
> 
> This is the start of the stable review cycle for the 5.4.106 release.
> There are 168 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>

Compiled and booted with no regressions on x86_64.

Tested-by: Ross Schmidt 


thanks,

Ross

Re: [question] Panic in dax_writeback_one

2021-03-16 Thread chenjun (AM)

在 2021/3/12 1:25, Dan Williams 写道:
> On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox  wrote:
>>
>> On Thu, Mar 11, 2021 at 07:48:25AM +, chenjun (AM) wrote:
>>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
>>> *dax_dev, struct address_space *mapping, void *entry)
>>> dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>>> The pfn is returned by the driver. In my case, the pfn does not have
>>> struct page. so pfn_to_page(pfn) return a wrong address.
>>
>> I wasn't involved, but I think the right solution here is simply to
>> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn).  I don't
>> know why Dan decided to do this in the more complicated way.
> 
> pfn_to_virt() only works for the direct-map. If pages are not mapped I
> don't see how pfn_to_virt() is expected to work.
> 
> The real question Chenjun is why are you writing a new simulator of
> memory as a block-device vs reusing the pmem driver or brd?
> 

Hi Dan

In my case, I do not want to take memory to create the struct page of 
the memory my driver used.

And, I think this is also a problem for DCSSBLK.

So I want to go back the older way if CONFIG_FS_DAX_LIMITED

diff --git a/fs/dax.c b/fs/dax.c
index b3d27fd..6395e84 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -867,6 +867,9 @@ static int dax_writeback_one(struct xa_state *xas, 
struct dax_device *dax_dev,
  {
unsigned long pfn, index, count;
long ret = 0;
+   void *kaddr;
+   pfn_t new_pfn_t;
+   pgoff_t pgoff;

/*
 * A page got tagged dirty in DAX mapping? Something is seriously
@@ -926,7 +929,25 @@ static int dax_writeback_one(struct xa_state *xas, 
struct dax_device *dax_dev,
index = xas->xa_index & ~(count - 1);

dax_entry_mkclean(mapping, index, pfn);
-   dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
+
+   if (!IS_ENABLED(CONFIG_FS_DAX_LIMITED) || pfn_valid(pfn))
+   kaddr = page_address(pfn_to_page(pfn));
+   else {
+   ret = bdev_dax_pgoff(mapping->host->i_sb->s_bdev, pfn << 
PFN_SECTION_SHIFT, count << PAGE_SHIFT, );
+   if (ret)
+   goto put_unlocked;
+
+   ret = dax_direct_access(dax_dev, pgoff, count, , 
_pfn_t);
+   if (ret < 0)
+   goto put_unlocked;
+
+   if (WARN_ON_ONCE(ret < count) || 
WARN_ON_ONCE(pfn_t_to_pfn(new_pfn_t) 
!= pfn)) {
+   ret = -EIO;
+   goto put_unlocked;
+   }
+   }
+
+   dax_flush(dax_dev, kaddr, count * PAGE_SIZE);
/*
 * After we have flushed the cache, we can clear the dirty tag. There
 * cannot be new dirty data in the pfn after the flush has completed as
-- 

-- 
Regards
Chen Jun

Re: [PATCH 4.19 000/120] 4.19.181-rc1 review

2021-03-16 Thread Ross Schmidt

On Mon, Mar 15, 2021 at 02:55:51PM +0100, gre...@linuxfoundation.org wrote:
> From: Greg Kroah-Hartman 
> 
> This is the start of the stable review cycle for the 4.19.181 release.
> There are 120 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>

Compiled and booted with no regressions on x86_64.

Tested-by: Ross Schmidt 


thanks,

Ross

[PATCH] drm/msm: Remove need for reiterating the compatibles

2021-03-16 Thread Bjorn Andersson

After spending a non-negligible time trying to figure out why
dpu_kms_init() would dereference a NULL dpu_kms->pdev, it turns out that
in addition to adding the new compatible to the msm_drv of_match_table
one also need to teach add_display_components() to register the child
nodes - which includes the DPU platform_device.

Replace the open coded test for compatibles with a check against the
match data of the mdss device to save others this trouble in the future.

Signed-off-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/msm_drv.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 94525ac76d4e..0f6e186a609d 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1173,10 +1173,11 @@ static int compare_name_mdp(struct device *dev, void 
*data)
return (strstr(dev_name(dev), "mdp") != NULL);
 }
 
-static int add_display_components(struct device *dev,
+static int add_display_components(struct platform_device *pdev,
  struct component_match **matchptr)
 {
struct device *mdp_dev;
+   struct device *dev = >dev;
int ret;
 
/*
@@ -1185,9 +1186,9 @@ static int add_display_components(struct device *dev,
 * Populate the children devices, find the MDP5/DPU node, and then add
 * the interfaces to our components list.
 */
-   if (of_device_is_compatible(dev->of_node, "qcom,mdss") ||
-   of_device_is_compatible(dev->of_node, "qcom,sdm845-mdss") ||
-   of_device_is_compatible(dev->of_node, "qcom,sc7180-mdss")) {
+   switch (get_mdp_ver(pdev)) {
+   case KMS_MDP5:
+   case KMS_DPU:
ret = of_platform_populate(dev->of_node, NULL, NULL, dev);
if (ret) {
DRM_DEV_ERROR(dev, "failed to populate children 
devices\n");
@@ -1206,9 +1207,11 @@ static int add_display_components(struct device *dev,
/* add the MDP component itself */
drm_of_component_match_add(dev, matchptr, compare_of,
   mdp_dev->of_node);
-   } else {
+   break;
+   case KMS_MDP4:
/* MDP4 */
mdp_dev = dev;
+   break;
}
 
ret = add_components_mdp(mdp_dev, matchptr);
@@ -1273,7 +1276,7 @@ static int msm_pdev_probe(struct platform_device *pdev)
int ret;
 
if (get_mdp_ver(pdev)) {
-   ret = add_display_components(>dev, );
+   ret = add_display_components(pdev, );
if (ret)
return ret;
}
-- 
2.29.2

linux-next: build warning after merge of the drm tree

2021-03-16 Thread Stephen Rothwell

Hi all,

After merging the drm tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

drivers/gpu/drm/rockchip/rockchip_drm_vop.c: In function 
'vop_plane_atomic_update':
drivers/gpu/drm/rockchip/rockchip_drm_vop.c:882:26: warning: unused variable 
'old_state' [-Wunused-variable]
  882 |  struct drm_plane_state *old_state = 
drm_atomic_get_old_plane_state(state,
  |  ^

Introduced by commit

  977697e20b3d ("drm/atomic: Pass the full state to planes atomic disable and 
update")

-- 
Cheers,
Stephen Rothwell


pgpfj25CBcDYs.pgp
Description: OpenPGP digital signature

[PATCH] sparc64: replace if (cond) BUG() with BUG_ON()

2021-03-16 Thread Jiapeng Chong

Fix the following coccicheck warnings:

./arch/sparc/kernel/traps_64.c:419:2-5: WARNING: Use BUG_ON instead of
if condition followed by BUG.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 arch/sparc/kernel/traps_64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index a850dcc..fbd9097 100644
--- a/arch/sparc/kernel/traps_64.c
+++ b/arch/sparc/kernel/traps_64.c
@@ -415,8 +415,7 @@ static void spitfire_clean_and_reenable_l1_caches(void)
 {
unsigned long va;
 
-   if (tlb_type != spitfire)
-   BUG();
+   BUG(tlb_type != spitfire);
 
/* Clean 'em. */
for (va =  0; va < (PAGE_SIZE << 1); va += 32) {
-- 
1.8.3.1

[PATCH v1 0/3] soc: rockchip: power-domain: add rk3568 powerdomains

2021-03-16 Thread Elaine Zhang

Support power domain function for RK3568 Soc.

Elaine Zhang (3):
  dt-bindings: add power-domain header for RK3568 SoCs
  dt-bindings: Convert the rockchip power_domain to YAML and extend
  soc: rockchip: power-domain: add rk3568 powerdomains

 .../bindings/soc/rockchip/power_domain.txt| 136 
 .../rockchip/rockchip,power-controller.yaml   | 199 ++
 drivers/soc/rockchip/pm_domains.c |  31 +++
 include/dt-bindings/power/rk3568-power.h  |  32 +++
 4 files changed, 262 insertions(+), 136 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
 create mode 100644 
Documentation/devicetree/bindings/soc/rockchip/rockchip,power-controller.yaml
 create mode 100644 include/dt-bindings/power/rk3568-power.h

-- 
2.17.1

[PATCH v1 1/3] dt-bindings: add power-domain header for RK3568 SoCs

2021-03-16 Thread Elaine Zhang

According to a description from TRM, add all the power domains

Signed-off-by: Elaine Zhang 
---
 include/dt-bindings/power/rk3568-power.h | 32 
 1 file changed, 32 insertions(+)
 create mode 100644 include/dt-bindings/power/rk3568-power.h

diff --git a/include/dt-bindings/power/rk3568-power.h 
b/include/dt-bindings/power/rk3568-power.h
new file mode 100644
index ..6cc1af1a9d26
--- /dev/null
+++ b/include/dt-bindings/power/rk3568-power.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __DT_BINDINGS_POWER_RK3568_POWER_H__
+#define __DT_BINDINGS_POWER_RK3568_POWER_H__
+
+/* VD_CORE */
+#define RK3568_PD_CPU_00
+#define RK3568_PD_CPU_11
+#define RK3568_PD_CPU_22
+#define RK3568_PD_CPU_33
+#define RK3568_PD_CORE_ALIVE   4
+
+/* VD_PMU */
+#define RK3568_PD_PMU  5
+
+/* VD_NPU */
+#define RK3568_PD_NPU  6
+
+/* VD_GPU */
+#define RK3568_PD_GPU  7
+
+/* VD_LOGIC */
+#define RK3568_PD_VI   8
+#define RK3568_PD_VO   9
+#define RK3568_PD_RGA  10
+#define RK3568_PD_VPU  11
+#define RK3568_PD_CENTER   12
+#define RK3568_PD_RKVDEC   13
+#define RK3568_PD_RKVENC   14
+#define RK3568_PD_PIPE 15
+#define RK3568_PD_LOGIC_ALIVE  16
+
+#endif
-- 
2.17.1

[PATCH v1 2/3] dt-bindings: Convert the rockchip power_domain to YAML and extend

2021-03-16 Thread Elaine Zhang

This converts the rockchip power domain family bindings to YAML schema,
and add binding documentation for the power domains found on Rockchip
RK3568 SoCs.

Signed-off-by: Elaine Zhang 
---
 .../bindings/soc/rockchip/power_domain.txt| 136 
 .../rockchip/rockchip,power-controller.yaml   | 196 ++
 2 files changed, 196 insertions(+), 136 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
 create mode 100644 
Documentation/devicetree/bindings/soc/rockchip/rockchip,power-controller.yaml

diff --git a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt 
b/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
deleted file mode 100644
index 8304eceb62e4..
--- a/Documentation/devicetree/bindings/soc/rockchip/power_domain.txt
+++ /dev/null
@@ -1,136 +0,0 @@
-* Rockchip Power Domains
-
-Rockchip processors include support for multiple power domains which can be
-powered up/down by software based on different application scenes to save 
power.
-
-Required properties for power domain controller:
-- compatible: Should be one of the following.
-   "rockchip,px30-power-controller" - for PX30 SoCs.
-   "rockchip,rk3036-power-controller" - for RK3036 SoCs.
-   "rockchip,rk3066-power-controller" - for RK3066 SoCs.
-   "rockchip,rk3128-power-controller" - for RK3128 SoCs.
-   "rockchip,rk3188-power-controller" - for RK3188 SoCs.
-   "rockchip,rk3228-power-controller" - for RK3228 SoCs.
-   "rockchip,rk3288-power-controller" - for RK3288 SoCs.
-   "rockchip,rk3328-power-controller" - for RK3328 SoCs.
-   "rockchip,rk3366-power-controller" - for RK3366 SoCs.
-   "rockchip,rk3368-power-controller" - for RK3368 SoCs.
-   "rockchip,rk3399-power-controller" - for RK3399 SoCs.
-- #power-domain-cells: Number of cells in a power-domain specifier.
-   Should be 1 for multiple PM domains.
-- #address-cells: Should be 1.
-- #size-cells: Should be 0.
-
-Required properties for power domain sub nodes:
-- reg: index of the power domain, should use macros in:
-   "include/dt-bindings/power/px30-power.h" - for PX30 type power domain.
-   "include/dt-bindings/power/rk3036-power.h" - for RK3036 type power 
domain.
-   "include/dt-bindings/power/rk3066-power.h" - for RK3066 type power 
domain.
-   "include/dt-bindings/power/rk3128-power.h" - for RK3128 type power 
domain.
-   "include/dt-bindings/power/rk3188-power.h" - for RK3188 type power 
domain.
-   "include/dt-bindings/power/rk3228-power.h" - for RK3228 type power 
domain.
-   "include/dt-bindings/power/rk3288-power.h" - for RK3288 type power 
domain.
-   "include/dt-bindings/power/rk3328-power.h" - for RK3328 type power 
domain.
-   "include/dt-bindings/power/rk3366-power.h" - for RK3366 type power 
domain.
-   "include/dt-bindings/power/rk3368-power.h" - for RK3368 type power 
domain.
-   "include/dt-bindings/power/rk3399-power.h" - for RK3399 type power 
domain.
-- clocks (optional): phandles to clocks which need to be enabled while power 
domain
-   switches state.
-- pm_qos (optional): phandles to qos blocks which need to be saved and restored
-   while power domain switches state.
-
-Qos Example:
-
-   qos_gpu: qos_gpu@ffaf {
-   compatible ="syscon";
-   reg = <0x0 0xffaf 0x0 0x20>;
-   };
-
-Example:
-
-   power: power-controller {
-   compatible = "rockchip,rk3288-power-controller";
-   #power-domain-cells = <1>;
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   pd_gpu {
-   reg = ;
-   clocks = < ACLK_GPU>;
-   pm_qos = <_gpu>;
-   };
-   };
-
-power: power-controller {
-compatible = "rockchip,rk3368-power-controller";
-#power-domain-cells = <1>;
-#address-cells = <1>;
-#size-cells = <0>;
-
-pd_gpu_1 {
-reg = ;
-clocks = < ACLK_GPU_CFG>;
-};
-};
-
-Example 2:
-   power: power-controller {
-   compatible = "rockchip,rk3399-power-controller";
-   #power-domain-cells = <1>;
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   pd_vio {
-   #address-cells = <1>;
-   #size-cells = <0>;
-   reg = ;
-
-   pd_vo {
-   #address-cells = <1>;
-   #size-cells = <0>;
-   reg = ;
-
-   pd_vopb {
-   reg = ;
-

[PATCH v1 3/3] soc: rockchip: power-domain: add rk3568 powerdomains

2021-03-16 Thread Elaine Zhang

Add power-domains found on rk3568 socs.

Signed-off-by: Elaine Zhang 
---
 drivers/soc/rockchip/pm_domains.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/drivers/soc/rockchip/pm_domains.c 
b/drivers/soc/rockchip/pm_domains.c
index 54eb6cfc5d5b..a2c19c845cf2 100644
--- a/drivers/soc/rockchip/pm_domains.c
+++ b/drivers/soc/rockchip/pm_domains.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct rockchip_domain_info {
int pwr_mask;
@@ -131,6 +132,9 @@ struct rockchip_pmu {
 #define DOMAIN_RK3399(pwr, status, req, wakeup)\
DOMAIN(pwr, status, req, req, req, wakeup)
 
+#define DOMAIN_RK3568(pwr, req, wakeup)\
+   DOMAIN_M(pwr, pwr, req, req, req, wakeup)
+
 static bool rockchip_pmu_domain_is_idle(struct rockchip_pm_domain *pd)
 {
struct rockchip_pmu *pmu = pd->pmu;
@@ -841,6 +845,18 @@ static const struct rockchip_domain_info 
rk3399_pm_domains[] = {
[RK3399_PD_SDIOAUDIO]   = DOMAIN_RK3399(BIT(31), BIT(31), BIT(29), 
true),
 };
 
+static const struct rockchip_domain_info rk3568_pm_domains[] = {
+   [RK3568_PD_NPU] = DOMAIN_RK3568(BIT(1), BIT(2), false),
+   [RK3568_PD_GPU] = DOMAIN_RK3568(BIT(0), BIT(1), false),
+   [RK3568_PD_VI]  = DOMAIN_RK3568(BIT(6), BIT(3), false),
+   [RK3568_PD_VO]  = DOMAIN_RK3568(BIT(7),  BIT(4), false),
+   [RK3568_PD_RGA] = DOMAIN_RK3568(BIT(5),  BIT(5), false),
+   [RK3568_PD_VPU] = DOMAIN_RK3568(BIT(2), BIT(6), false),
+   [RK3568_PD_RKVDEC]  = DOMAIN_RK3568(BIT(4), BIT(8), false),
+   [RK3568_PD_RKVENC]  = DOMAIN_RK3568(BIT(3), BIT(7), false),
+   [RK3568_PD_PIPE]= DOMAIN_RK3568(BIT(8), BIT(11), false),
+};
+
 static const struct rockchip_pmu_info px30_pmu = {
.pwr_offset = 0x18,
.status_offset = 0x20,
@@ -976,6 +992,17 @@ static const struct rockchip_pmu_info rk3399_pmu = {
.domain_info = rk3399_pm_domains,
 };
 
+static const struct rockchip_pmu_info rk3568_pmu = {
+   .pwr_offset = 0xa0,
+   .status_offset = 0x98,
+   .req_offset = 0x50,
+   .idle_offset = 0x68,
+   .ack_offset = 0x60,
+
+   .num_domains = ARRAY_SIZE(rk3568_pm_domains),
+   .domain_info = rk3568_pm_domains,
+};
+
 static const struct of_device_id rockchip_pm_domain_dt_match[] = {
{
.compatible = "rockchip,px30-power-controller",
@@ -1021,6 +1048,10 @@ static const struct of_device_id 
rockchip_pm_domain_dt_match[] = {
.compatible = "rockchip,rk3399-power-controller",
.data = (void *)_pmu,
},
+   {
+   .compatible = "rockchip,rk3568-power-controller",
+   .data = (void *)_pmu,
+   },
{ /* sentinel */ },
 };
 
-- 
2.17.1

Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

2021-03-16 Thread Kai-Heng Feng

Hi,

On Wed, Mar 17, 2021 at 10:17 AM Josef Bacik  wrote:
>
> This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
>
> This patch causes a panic when rebooting my Dell Poweredge r440.  I do
> not have the full panic log as it's lost at that stage of the reboot and
> I do not have a serial console.  Reverting this patch makes my system
> able to reboot again.

But this patch also helps many HP laptops, so maybe we should figure
out what's going on on Poweredge r440.
Does it also panic on shutdown?

Kai-Heng

>
> Signed-off-by: Josef Bacik 
> ---
> - apologies, I mistyped the lkml list email.
>
>  kernel/reboot.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index eb1b15850761..a6ad5eb2fa73 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -244,8 +244,6 @@ void migrate_to_reboot_cpu(void)
>  void kernel_restart(char *cmd)
>  {
> kernel_restart_prepare(cmd);
> -   if (pm_power_off_prepare)
> -   pm_power_off_prepare();
> migrate_to_reboot_cpu();
> syscore_shutdown();
> if (!cmd)
> --
> 2.26.2
>

RE: RE: [PATCH v5 05/10] scsi: ufshpb: Region inactivation in host mode

2021-03-16 Thread Daejun Park

>> >> ---
>> >>  drivers/scsi/ufs/ufshpb.c | 14 ++
>> >>  drivers/scsi/ufs/ufshpb.h |  1 +
>> >>  2 files changed, 15 insertions(+)
>> >>
>> >> diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
>> >> index 6f4fd22eaf2f..0744feb4d484 100644
>> >> --- a/drivers/scsi/ufs/ufshpb.c
>> >> +++ b/drivers/scsi/ufs/ufshpb.c
>> >> @@ -907,6 +907,7 @@ static int ufshpb_execute_umap_req(struct
>> >> ufshpb_lu *hpb,
>> >>
>> >>  blk_execute_rq_nowait(q, NULL, req, 1, ufshpb_umap_req_compl_fn);
>> >>
>> >> +hpb->stats.umap_req_cnt++;
>> >>  return 0;
>> >>  }
>> >>
>> >> @@ -1103,6 +1104,12 @@ static int ufshpb_issue_umap_req(struct
>> >> ufshpb_lu *hpb,
>> >>  return -EAGAIN;
>> >>  }
>> >>
>> >> +static int ufshpb_issue_umap_single_req(struct ufshpb_lu *hpb,
>> >> +struct ufshpb_region *rgn)
>> >> +{
>> >> +return ufshpb_issue_umap_req(hpb, rgn);
>> >> +}
>> >> +
>> >>  static int ufshpb_issue_umap_all_req(struct ufshpb_lu *hpb)
>> >>  {
>> >>  return ufshpb_issue_umap_req(hpb, NULL);
>> >> @@ -1115,6 +1122,10 @@ static void __ufshpb_evict_region(struct
>> >> ufshpb_lu *hpb,
>> >>  struct ufshpb_subregion *srgn;
>> >>  int srgn_idx;
>> >>
>> >> +
>> >> +if (hpb->is_hcm && ufshpb_issue_umap_single_req(hpb, rgn))
>> >
>> > __ufshpb_evict_region() is called with rgn_state_lock held and IRQ
>> > disabled,
>> > when ufshpb_issue_umap_single_req() invokes blk_execute_rq_nowait(),
>> > below
>> > warning shall pop up every time, fix it?
>> >
>> > void blk_execute_rq_nowait(struct request_queue *q, struct gendisk
>> > *bd_disk,
>> >  struct request *rq, int at_head,
>> >  rq_end_io_fn *done)
>> > {
>> >   WARN_ON(irqs_disabled());
>> > ...
>> >
>> 
>> Moreover, since we are here with rgn_state_lock held and IRQ disabled,
>> in ufshpb_get_req(), rq = kmem_cache_alloc(hpb->map_req_cache,
>> GFP_KERNEL)
>> has the GFP_KERNEL flag, scheduling while atomic???
>I think your comment applies to  ufshpb_issue_umap_all_req as well,
>Which is called from slave_configure/scsi_add_lun.
> 
>Since the host-mode series is utilizing the framework laid by the device-mode,
>Maybe you can add this comment to  Daejun's last version?

Hi Avri, Can Guo

I think ufshpb_issue_umap_single_req() can be moved to end of 
ufshpb_evict_region().
Then we can avoid rgn_state_lock when it sends unmap command.

Thanks,
Daejun


>Thanks,
>Avri
> 
>> 
>> Can Guo.
>> 
>> > Thanks.
>> > Can Guo.
>> >
>> >> +return;
>> >> +
>> >>  lru_info = >lru_info;
>> >>
>> >>  dev_dbg(>sdev_ufs_lu->sdev_dev, "evict region %d\n",
>> >> rgn->rgn_idx);
>> >> @@ -1855,6 +1866,7 @@ ufshpb_sysfs_attr_show_func(rb_noti_cnt);
>> >>  ufshpb_sysfs_attr_show_func(rb_active_cnt);
>> >>  ufshpb_sysfs_attr_show_func(rb_inactive_cnt);
>> >>  ufshpb_sysfs_attr_show_func(map_req_cnt);
>> >> +ufshpb_sysfs_attr_show_func(umap_req_cnt);
>> >>
>> >>  static struct attribute *hpb_dev_stat_attrs[] = {
>> >>  _attr_hit_cnt.attr,
>> >> @@ -1863,6 +1875,7 @@ static struct attribute *hpb_dev_stat_attrs[] =
>> >> {
>> >>  _attr_rb_active_cnt.attr,
>> >>  _attr_rb_inactive_cnt.attr,
>> >>  _attr_map_req_cnt.attr,
>> >> +_attr_umap_req_cnt.attr,
>> >>  NULL,
>> >>  };
>> >>
>> >> @@ -1978,6 +1991,7 @@ static void ufshpb_stat_init(struct ufshpb_lu
>> >> *hpb)
>> >>  hpb->stats.rb_active_cnt = 0;
>> >>  hpb->stats.rb_inactive_cnt = 0;
>> >>  hpb->stats.map_req_cnt = 0;
>> >> +hpb->stats.umap_req_cnt = 0;
>> >>  }
>> >>
>> >>  static void ufshpb_param_init(struct ufshpb_lu *hpb)
>> >> diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
>> >> index bd4308010466..84598a317897 100644
>> >> --- a/drivers/scsi/ufs/ufshpb.h
>> >> +++ b/drivers/scsi/ufs/ufshpb.h
>> >> @@ -186,6 +186,7 @@ struct ufshpb_stats {
>> >>  u64 rb_inactive_cnt;
>> >>  u64 map_req_cnt;
>> >>  u64 pre_req_cnt;
>> >> +u64 umap_req_cnt;
>> >>  };
>> >>
>> >>  struct ufshpb_lu {
> 
> 
>

[PATCH] sched: replace if (cond) BUG() with BUG_ON()

2021-03-16 Thread Jiapeng Chong

Fix the following coccicheck warnings:

./kernel/sched/core.c:8039:2-5: WARNING: Use BUG_ON instead of if
condition followed by BUG.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 kernel/sched/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9819121..7392bc0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8035,8 +8035,7 @@ void __init sched_init_smp(void)
mutex_unlock(_domains_mutex);
 
/* Move init over to a non-isolated CPU */
-   if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) 
< 0)
-   BUG();
+   BUG(set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) 
< 0);
sched_init_granularity();
 
init_sched_rt_class();
-- 
1.8.3.1

Re: [PATCH v29 4/4] scsi: ufs: Add HPB 2.0 support

2021-03-16 Thread Can Guo


On 2021-03-17 09:42, Daejun Park wrote:

On 2021-03-15 15:23, Can Guo wrote:

On 2021-03-15 15:07, Daejun Park wrote:

This patch supports the HPB 2.0.

The HPB 2.0 supports read of varying sizes from 4KB to 512KB.
In the case of Read (<= 32KB) is supported as single HPB read.
In the case of Read (36KB ~ 512KB) is supported by as a 
combination

of
write buffer command and HPB read command to deliver more PPN.
The write buffer commands may not be issued immediately due to 
busy

tags.
To use HPB read more aggressively, the driver can requeue the 
write

buffer
command. The requeue threshold is implemented as timeout and can 
be

modified with requeue_timeout_ms entry in sysfs.

Signed-off-by: Daejun Park 
---
+static struct attribute *hpb_dev_param_attrs[] = {
+_attr_requeue_timeout_ms.attr,
+NULL,
+};
+
+struct attribute_group ufs_sysfs_hpb_param_group = {
+.name = "hpb_param_sysfs",
+.attrs = hpb_dev_param_attrs,
+};
+
+static int ufshpb_pre_req_mempool_init(struct ufshpb_lu *hpb)
+{
+struct ufshpb_req *pre_req = NULL;
+int qd = hpb->sdev_ufs_lu->queue_depth / 2;
+int i, j;
+
+INIT_LIST_HEAD(>lh_pre_req_free);
+
+hpb->pre_req = kcalloc(qd, sizeof(struct ufshpb_req),
GFP_KERNEL);
+hpb->throttle_pre_req = qd;
+hpb->num_inflight_pre_req = 0;
+
+if (!hpb->pre_req)
+goto release_mem;
+
+for (i = 0; i < qd; i++) {
+pre_req = hpb->pre_req + i;
+INIT_LIST_HEAD(_req->list_req);
+pre_req->req = NULL;
+pre_req->bio = NULL;


Why don't prepare bio as same as wb.m_page? Won't that save more 
time

for ufshpb_issue_pre_req()?


It is pre_req pool. So although we prepare bio at this time, it just
only for first pre_req.


I meant removing the bio_alloc() in ufshpb_issue_pre_req() and
bio_put()
in ufshpb_pre_req_compl_fn(). bios, in pre_req's case, just hold a
page.
So, prepare 16 (if queue depth is 32) bios here, just use them along
with
wb.m_page and call bio_reset() in ufshpb_pre_req_compl_fn(). Shall it
work?



If it works, you can even have the bio_add_pc_page() called here. 
Later

in
ufshpb_execute_pre_req(), you don't need to call
ufshpb_pre_req_add_bio_page(),
just call ufshpb_prep_entry() once instead - it save many repeated 
steps

for a
pre_req, and you don't even need to call bio_reset() in this case, 
since

for a
bio, nothing changes after it is binded with a specific page...


Hi, Can Guo

I tried the idea that you suggested, but it doesn't work properly.
This optimization should be done next time for enhancement.


Can you elaborate please? Any error seen?

Per my understanding, in the case for pre_reqs, a bio is no different
from a page. Here it can reserve 16 pages for later use, which can be
done the same for bios.

This is not an enhancement, but a doubt - why not? Unless it is not 
doable.


Thanks,
Can Guo.



Thanks
Daejun


Can Guo.


Thanks,
Can Guo.


After use it, it should be prepared bio at issue phase.

Thanks,
Daejun



Thanks,
Can Guo.


+
+pre_req->wb.m_page = alloc_page(GFP_KERNEL |
__GFP_ZERO);
+if (!pre_req->wb.m_page) {
+for (j = 0; j < i; j++)
+
__free_page(hpb->pre_req[j].wb.m_page);
+
+goto release_mem;
+}
+list_add_tail(_req->list_req,
>lh_pre_req_free);
+}
+
+return 0;
+release_mem:
+kfree(hpb->pre_req);
+return -ENOMEM;
+}
+

[PATCH 2/2] platform_x86: intel_pmt_crashlog: Fix incorrect macros

2021-03-16 Thread David E. Box

Fixes off-by-one bugs in the macro assignments for the crashlog control
bits. Was initially tested on emulation but bug revealed after testing on
silicon.

Fixes: 5ef9998c96b0 ("platform/x86: Intel PMT Crashlog capability driver")
Signed-off-by: David E. Box 
---
 drivers/platform/x86/intel_pmt_crashlog.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/platform/x86/intel_pmt_crashlog.c 
b/drivers/platform/x86/intel_pmt_crashlog.c
index 97dd749c8290..92d315a16cfd 100644
--- a/drivers/platform/x86/intel_pmt_crashlog.c
+++ b/drivers/platform/x86/intel_pmt_crashlog.c
@@ -23,18 +23,17 @@
 #define CRASH_TYPE_OOBMSM  1
 
 /* Control Flags */
-#define CRASHLOG_FLAG_DISABLE  BIT(27)
+#define CRASHLOG_FLAG_DISABLE  BIT(28)
 
 /*
- * Bits 28 and 29 control the state of bit 31.
+ * Bits 29 and 30 control the state of bit 31.
  *
- * Bit 28 will clear bit 31, if set, allowing a new crashlog to be captured.
- * Bit 29 will immediately trigger a crashlog to be generated, setting bit 31.
- * Bit 30 is read-only and reserved as 0.
+ * Bit 29 will clear bit 31, if set, allowing a new crashlog to be captured.
+ * Bit 30 will immediately trigger a crashlog to be generated, setting bit 31.
  * Bit 31 is the read-only status with a 1 indicating log is complete.
  */
-#define CRASHLOG_FLAG_TRIGGER_CLEARBIT(28)
-#define CRASHLOG_FLAG_TRIGGER_EXECUTE  BIT(29)
+#define CRASHLOG_FLAG_TRIGGER_CLEARBIT(29)
+#define CRASHLOG_FLAG_TRIGGER_EXECUTE  BIT(30)
 #define CRASHLOG_FLAG_TRIGGER_COMPLETE BIT(31)
 #define CRASHLOG_FLAG_TRIGGER_MASK GENMASK(31, 28)
 
-- 
2.25.1

Re: [PATCH v5 07/10] scsi: ufshpb: Add "Cold" regions timer

2021-03-16 Thread Can Guo


On 2021-03-16 17:21, Avri Altman wrote:

> +static void ufshpb_read_to_handler(struct work_struct *work)
> +{
> + struct delayed_work *dwork = to_delayed_work(work);
> + struct ufshpb_lu *hpb;
> + struct victim_select_info *lru_info;
> + struct ufshpb_region *rgn;
> + unsigned long flags;
> + LIST_HEAD(expired_list);
> +
> + hpb = container_of(dwork, struct ufshpb_lu, ufshpb_read_to_work);
> +
> + spin_lock_irqsave(>rgn_state_lock, flags);
> +
> + lru_info = >lru_info;
> +
> + list_for_each_entry(rgn, _info->lh_lru_rgn, list_lru_rgn) {
> + bool timedout = ktime_after(ktime_get(), rgn->read_timeout);
> +
> + if (timedout) {
> + rgn->read_timeout_expiries--;
> + if (is_rgn_dirty(rgn) ||
> + rgn->read_timeout_expiries == 0)
> + list_add(>list_expired_rgn, _list);
> + else
> + rgn->read_timeout = ktime_add_ms(ktime_get(),
> +  READ_TO_MS);
> + }
> + }
> +
> + spin_unlock_irqrestore(>rgn_state_lock, flags);
> +
> + list_for_each_entry(rgn, _list, list_expired_rgn) {

Here can be problematic - since you don't have the native expired_list
initialized
before use, if above loop did not insert anything to expired_list, it
shall become
a dead loop here.

Not sure what you meant by native initialization.
LIST_HEAD is statically initializing an empty list, resulting the same
outcome as INIT_LIST_HEAD.



Sorry for making you confused, you should use list_for_each_entry_safe()
instead of list_for_each_entry() as you are deleting entries within the 
loop,
otherwise, this can become an infinite loop. Again, have you tested this 
patch
before upload? I am sure this is problematic - when it becomes an 
inifinite

loop, below path will hang...

ufshcd_suspend()->ufshpb_suspend()->cancel_jobs()->cancel_delayed_work()



And, which lock is protecting rgn->list_expired_rgn? If two
read_to_handler works
are running in parallel, one can be inserting it to its expired_list
while another
can be deleting it.
The timeout handler, being a delayed work, is meant to run every 
polling period.

Originally, I had it protected from 2 handlers running concurrently,
But I removed it following Daejun's comment, which I accepted,
Since it is always scheduled using the same polling period.


But one can set the delay to 0 through sysfs, right?

Thanks,
Can Guo.



Thanks,
Avri



Can Guo.

> + list_del_init(>list_expired_rgn);
> + spin_lock_irqsave(>rsp_list_lock, flags);
> + ufshpb_update_inactive_info(hpb, rgn->rgn_idx);
> + hpb->stats.rb_inactive_cnt++;
> + spin_unlock_irqrestore(>rsp_list_lock, flags);
> + }
> +
> + ufshpb_kick_map_work(hpb);
> +
> + schedule_delayed_work(>ufshpb_read_to_work,
> +   msecs_to_jiffies(POLLING_INTERVAL_MS));
> +}
> +
>  static void ufshpb_add_lru_info(struct victim_select_info *lru_info,
>   struct ufshpb_region *rgn)
>  {
>   rgn->rgn_state = HPB_RGN_ACTIVE;
>   list_add_tail(>list_lru_rgn, _info->lh_lru_rgn);
>   atomic_inc(_info->active_cnt);
> + if (rgn->hpb->is_hcm) {
> + rgn->read_timeout = ktime_add_ms(ktime_get(), READ_TO_MS);
> + rgn->read_timeout_expiries = READ_TO_EXPIRIES;
> + }
>  }
>
>  static void ufshpb_hit_lru_info(struct victim_select_info *lru_info,
> @@ -1813,6 +1865,7 @@ static int ufshpb_alloc_region_tbl(struct
> ufs_hba *hba, struct ufshpb_lu *hpb)
>
>   INIT_LIST_HEAD(>list_inact_rgn);
>   INIT_LIST_HEAD(>list_lru_rgn);
> + INIT_LIST_HEAD(>list_expired_rgn);
>
>   if (rgn_idx == hpb->rgns_per_lu - 1) {
>   srgn_cnt = ((hpb->srgns_per_lu - 1) %
> @@ -1834,6 +1887,7 @@ static int ufshpb_alloc_region_tbl(struct
> ufs_hba *hba, struct ufshpb_lu *hpb)
>   }
>
>   rgn->rgn_flags = 0;
> + rgn->hpb = hpb;
>   }
>
>   return 0;
> @@ -2053,6 +2107,8 @@ static int ufshpb_lu_hpb_init(struct ufs_hba
> *hba, struct ufshpb_lu *hpb)
> ufshpb_normalization_work_handler);
>   INIT_WORK(>ufshpb_lun_reset_work,
> ufshpb_reset_work_handler);
> + INIT_DELAYED_WORK(>ufshpb_read_to_work,
> +   ufshpb_read_to_handler);
>   }
>
>   hpb->map_req_cache = kmem_cache_create("ufshpb_req_cache",
> @@ -2087,6 +2143,10 @@ static int ufshpb_lu_hpb_init(struct ufs_hba
> *hba, struct ufshpb_lu *hpb)
>   ufshpb_stat_init(hpb);
>   ufshpb_param_init(hpb);
>
> + if (hpb->is_hcm)
> + schedule_delayed_work(>ufshpb_read_to_work,
> +   msecs_to_jiffies(POLLING_INTERVAL_MS));
> +
>   return 0;
>
>

[PATCH 1/2] platform/x86: intel_pmt_class: Initial resource to 0

2021-03-16 Thread David E. Box

Initialize the struct resource in intel_pmt_dev_register to zero to avoid a
fault should the char *name field be non-zero.

Signed-off-by: David E. Box 
---

Base commit is v5.12-rc3.

 drivers/platform/x86/intel_pmt_class.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/platform/x86/intel_pmt_class.c 
b/drivers/platform/x86/intel_pmt_class.c
index c8939fba4509..ee2b3bbeb83d 100644
--- a/drivers/platform/x86/intel_pmt_class.c
+++ b/drivers/platform/x86/intel_pmt_class.c
@@ -173,7 +173,7 @@ static int intel_pmt_dev_register(struct intel_pmt_entry 
*entry,
  struct intel_pmt_namespace *ns,
  struct device *parent)
 {
-   struct resource res;
+   struct resource res = {0};
struct device *dev;
int ret;
 

base-commit: 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
-- 
2.25.1

[PATCH] iio:imu:mpu6050: Modify matricies to matrices

2021-03-16 Thread Guoqing chi

From: Guoqing Chi 

The complex number of "matrix" is "matrices".

Signed-off-by: Guoqing Chi 
---
 include/linux/platform_data/invensense_mpu6050.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/platform_data/invensense_mpu6050.h 
b/include/linux/platform_data/invensense_mpu6050.h
index 93974f4cfba1..f05b37521f67 100644
--- a/include/linux/platform_data/invensense_mpu6050.h
+++ b/include/linux/platform_data/invensense_mpu6050.h
@@ -12,7 +12,7 @@
  * mounting matrix retrieved from device-tree)
  *
  * Contains platform specific information on how to configure the MPU6050 to
- * work on this platform.  The orientation matricies are 3x3 rotation matricies
+ * work on this platform.  The orientation matrices are 3x3 rotation matrices
  * that are applied to the data to rotate from the mounting orientation to the
  * platform orientation.  The values must be one of 0, 1, or -1 and each row 
and
  * column should have exactly 1 non-zero value.
-- 
2.17.1

[tip:irq/core] BUILD SUCCESS 5c982c58752118b6c1f295024d3fda5ff22d3c52

2021-03-16 Thread kernel test robot

defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20210316
i386 randconfig-a005-20210316
i386 randconfig-a002-20210316
i386 randconfig-a003-20210316
i386 randconfig-a004-20210316
i386 randconfig-a006-20210316
x86_64   randconfig-a011-20210316
x86_64   randconfig-a016-20210316
x86_64   randconfig-a013-20210316
x86_64   randconfig-a014-20210316
x86_64   randconfig-a015-20210316
x86_64   randconfig-a012-20210316
i386 randconfig-a013-20210316
i386 randconfig-a016-20210316
i386 randconfig-a011-20210316
i386 randconfig-a012-20210316
i386 randconfig-a015-20210316
i386 randconfig-a014-20210316
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a006-20210316
x86_64   randconfig-a001-20210316
x86_64   randconfig-a005-20210316
x86_64   randconfig-a004-20210316
x86_64   randconfig-a003-20210316
x86_64   randconfig-a002-20210316

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

Re: [PATCH v4 02/28] mm: Add an unlock function for PG_private_2/PG_fscache

2021-03-16 Thread Linus Torvalds

On Tue, Mar 16, 2021 at 7:12 PM Josef Bacik  wrote:
>
>
> Yeah it's just a flag, we use it to tell that the page is part of a range that
> has been allocated for IO.  The lifetime of the page is independent of the 
> page,
> but is generally either dirty or under writeback, so either it goes through
> truncate and we clear PagePrivate2 there, or it actually goes through IO and 
> is
> cleared before we drop the page in our endio.

Ok, that's what it looked like from my very limited "looking at a
couple of grep cases", but I didn't go any further than that.

> We _always_ have PG_private set on the page as long as we own it, and
> PG_private_2 is only set in this IO related context, so we're safe
> there because of the rules around PG_dirty/PG_writeback. We don't need
> it to have an extra ref for it being set.

Perfect. That means that at least as far as btrfs is concerned, we
could trivially remove PG_private_2 from that page_has_private() math
- you'd always see the same result anyway, exactly because you have
PG_private set.

And as far as I can tell, fscache doesn't want that PG_private_2 bit
to interact with the random VM lifetime or migration rules either, and
should rely entirely on the page count. David?

There's actually a fair number of page_has_private() users, so we'd
better make sure that's the case. But it's simplified by this but
really only being used by btrfs (which doesn't care) and fscache, so
this cleanup would basically be entirely up to the whole fscache
series.

Hmm? Objections?

Linus

RE: [PATCH] x86/kaslr: try process e820 entries if can not get suitable regions from efi

2021-03-16 Thread linfeng (M)


On Wed, 17 Mar 09:54, Lin Feng  wrote:
After more than one month testing, we find that it is not suitable to process
e820 directly in kexec to place the kernel code. Some regions, like tmplog
and memattr tables, are not marked as reserved in e820.
Take tmplog, for example, the memory of table is marked as E820_TYPE_RAM
in e820 and EFI_LOADER_DATA in efi. I wonder why not marked it as
E820_TYPE_RESERVED, which is contrary to our expectations. So processing
e820 directly in kexec is against the principles of the commit
0982adc74673 ("x86/boot/KASLR: Work around firmware bugs by excluding
We try to avoid placing kernel code or data on tmplog memory range in
kexec. But unfortunately, the efi info is not runnable, so it is abandoned in 
function
efi_map_regions. We can not get the info in kexec.
Any way, we skill haven't found a suitable solution. Any ideas, friends?
> On Wed, 6 Jan 2021 03:04, Lin Feng  wrote:
> >
> > On Tue, Jan 05, 2021 at 09:54:52AM +0100, Ard Biesheuvel wrote:
> > > (cc Arvind)
> > >
> > > On Tue, 5 Jan 2021 at 09:54, Lin Feng  wrote:
> > > >
> > > > On efi64 x86_64 system, the EFI_CONVENTIONAL_MEMORY regions will not
> > > > be mapped when making EFI runtime calls. So kexec-tools can not
> > > > get these from /sys/firmware/efi/runtime-map. Then compressed boot
> > > > os can not get suitable regions in process_efi_entries and print
> > > > debug message as follow:
> > > > Physical KASLR disabled: no suitable memory region!
> > > > To enable physical kaslr with kexec, call process_e820_entries
> > > > when no suitable regions in efi memmaps.
> > > >
> > > > Signed-off-by: Lin Feng 
> > > >
> > > > ---
> > > >
> > > > I find a regular of Kernel code and data placement with kexec. It
> > > > seems unsafe. The reason is showed above.
> > > >
> > > > I'm not familiar with efi firmware. I wonder if there are some
> > > > risks to get regions according to e820 when there is no suitable
> > > > region in efi memmaps.
> > > > ---
> > > >  arch/x86/boot/compressed/kaslr.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/boot/compressed/kaslr.c
> > > > b/arch/x86/boot/compressed/kaslr.c
> > > > index b92fffbe761f..dbd7244b71aa 100644
> > > > --- a/arch/x86/boot/compressed/kaslr.c
> > > > +++ b/arch/x86/boot/compressed/kaslr.c
> > > > @@ -685,6 +685,7 @@ process_efi_entries(unsigned long minimum,
> > > > unsigned long image_size)  {
> > > > struct efi_info *e = _params->efi_info;
> > > > bool efi_mirror_found = false;
> > > > +   bool efi_mem_region_found = false;
> > > > struct mem_vector region;
> > > > efi_memory_desc_t *md;
> > > > unsigned long pmap;
> > > > @@ -742,12 +743,13 @@ process_efi_entries(unsigned long minimum, 
> > > > unsigned long image_size)
> > > > !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
> > > > continue;
> > > >
> > > > +   efi_mem_region_found = false;
> >^^ this should be true, not false.
> You're right. It should be true here. Thanks for pointing out.
> >
> > Other than that, I think this should be okay. The reason EFI memmap is
> > preferred over E820, according to commit
> >
> >   0982adc74673 ("x86/boot/KASLR: Work around firmware bugs by
> > excluding
> > EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice")
> >
> > was to avoid allocating inside EFI_BOOT_SERVICES/EFI_LOADER_DATA etc.
> > That's not a danger during kexec, and I believe runtime services
> > regions should be marked as reserved in the E820 map, right?
> Yes.
> >
> > Also, something a little fishy-looking here is that the first loop to
> > see if there is any EFI_MEMORY_MORE_RELIABLE region does not apply any
> > of the checks on the memory region type/attributes. If there is a
> > mirror region but it isn't conventional memory, or if it was
> > soft-reserved, we shouldn't be setting efi_mirror_found.
> I think so. And I wonder if the memory mirror doesn't work with kexec and 
> ksalr
> only this patch used, because a lot of efi information is lost and e820 don't 
> have
> any mirror regions information. Due to resource constraints, I haven't tested 
> it
> yet.
> But it seems so.
> >
> >
> > > > region.start = md->phys_addr;
> > > > region.size = md->num_pages << EFI_PAGE_SHIFT;
> > > > if (process_mem_region(, minimum, image_size))
> > > > break;
> > > > }
> > > > -   return true;
> > > > +   return efi_mem_region_found;
> > > >  }
> > > >  #else
> > > >  static inline bool
> > > > --
> > > > 2.23.0
> > > >

Re: [PATCH] hpsa: fix boot on ia64 (atomic_t alignment)

2021-03-16 Thread Martin K. Petersen



Arnd,

> Actually that still feels wrong: the annotation of the struct is to
> pack every member, which causes the access to be done in byte units on
> architectures that do not have hardware unaligned load/store
> instructions, at least for things like atomic_read() that does not go
> through a cmpxchg() or ll/sc cycle.

> This change may fix itanium, but it's still not correct. Other
> architectures would have already been broken before the recent change,
> but that's not a reason against fixing them now.

I agree. I understand why there are restrictions on fields consumed by
the hardware. But for fields internal to the driver the packing doesn't
make sense to me.

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH v6 21/21] selftests/resctrl: Create .gitignore to include resctrl_tests

2021-03-16 Thread Fenghua Yu

Create .gitignore to hold the test file resctrl_tests generated after
compiling.

Suggested-by: Shuah Khan 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Add this patch (Shuah)

 tools/testing/selftests/resctrl/.gitignore | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 tools/testing/selftests/resctrl/.gitignore

diff --git a/tools/testing/selftests/resctrl/.gitignore 
b/tools/testing/selftests/resctrl/.gitignore
new file mode 100644
index ..ab68442b6bc8
--- /dev/null
+++ b/tools/testing/selftests/resctrl/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+resctrl_tests
-- 
2.31.0

[PATCH v6 16/21] selftests/resctrl: Modularize resctrl test suite main() function

2021-03-16 Thread Fenghua Yu

Resctrl test suite main() function does the following things
1. Parses command line arguments passed by user
2. Some setup checks
3. Logic that calls into each unit test
4. Print result and clean up after running each unit test

Introduce wrapper functions for steps 3 and 4 to modularize the main()
function. Adding these wrapper functions makes it easier to add any logic
to each individual test.

Please note that this is a preparatory patch for the next one and no
functional changes are intended.

Suggested-by: Reinette Chatre 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 .../testing/selftests/resctrl/resctrl_tests.c | 88 ---
 1 file changed, 57 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c 
b/tools/testing/selftests/resctrl/resctrl_tests.c
index 2ace464b96d1..e63e0d8764ef 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -54,10 +54,58 @@ void tests_cleanup(void)
cat_test_cleanup();
 }
 
+static void run_mbm_test(bool has_ben, char **benchmark_cmd, int span,
+int cpu_no, char *bw_report)
+{
+   int res;
+
+   ksft_print_msg("Starting MBM BW change ...\n");
+   if (!has_ben)
+   sprintf(benchmark_cmd[5], "%s", MBA_STR);
+   res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd);
+   ksft_test_result(!res, "MBM: bw change\n");
+   mbm_test_cleanup();
+}
+
+static void run_mba_test(bool has_ben, char **benchmark_cmd, int span,
+int cpu_no, char *bw_report)
+{
+   int res;
+
+   ksft_print_msg("Starting MBA Schemata change ...\n");
+   if (!has_ben)
+   sprintf(benchmark_cmd[1], "%d", span);
+   res = mba_schemata_change(cpu_no, bw_report, benchmark_cmd);
+   ksft_test_result(!res, "MBA: schemata change\n");
+   mba_test_cleanup();
+}
+
+static void run_cmt_test(bool has_ben, char **benchmark_cmd, int cpu_no)
+{
+   int res;
+
+   ksft_print_msg("Starting CMT test ...\n");
+   if (!has_ben)
+   sprintf(benchmark_cmd[5], "%s", CMT_STR);
+   res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
+   ksft_test_result(!res, "CMT: test\n");
+   cmt_test_cleanup();
+}
+
+static void run_cat_test(int cpu_no, int no_of_bits)
+{
+   int res;
+
+   ksft_print_msg("Starting CAT test ...\n");
+   res = cat_perf_miss_val(cpu_no, no_of_bits, "L3");
+   ksft_test_result(!res, "CAT: test\n");
+   cat_test_cleanup();
+}
+
 int main(int argc, char **argv)
 {
bool has_ben = false, mbm_test = true, mba_test = true, cmt_test = true;
-   int res, c, cpu_no = 1, span = 250, argc_new = argc, i, no_of_bits = 0;
+   int c, cpu_no = 1, span = 250, argc_new = argc, i, no_of_bits = 0;
char *benchmark_cmd[BENCHMARK_ARGS], bw_report[64], bm_type[64];
char benchmark_cmd_area[BENCHMARK_ARGS][BENCHMARK_ARG_SIZE];
int ben_ind, ben_count, tests = 0;
@@ -170,39 +218,17 @@ int main(int argc, char **argv)
 
ksft_set_plan(tests ? : 4);
 
-   if (!is_amd && mbm_test) {
-   ksft_print_msg("Starting MBM BW change ...\n");
-   if (!has_ben)
-   sprintf(benchmark_cmd[5], "%s", MBA_STR);
-   res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd);
-   ksft_test_result(!res, "MBM: bw change\n");
-   mbm_test_cleanup();
-   }
+   if (!is_amd && mbm_test)
+   run_mbm_test(has_ben, benchmark_cmd, span, cpu_no, bw_report);
 
-   if (!is_amd && mba_test) {
-   ksft_print_msg("Starting MBA Schemata change ...\n");
-   if (!has_ben)
-   sprintf(benchmark_cmd[1], "%d", span);
-   res = mba_schemata_change(cpu_no, bw_report, benchmark_cmd);
-   ksft_test_result(!res, "MBA: schemata change\n");
-   mba_test_cleanup();
-   }
+   if (!is_amd && mba_test)
+   run_mba_test(has_ben, benchmark_cmd, span, cpu_no, bw_report);
 
-   if (cmt_test) {
-   ksft_print_msg("Starting CMT test ...\n");
-   if (!has_ben)
-   sprintf(benchmark_cmd[5], "%s", CMT_STR);
-   res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
-   ksft_test_result(!res, "CMT: test\n");
-   cmt_test_cleanup();
-   }
+   if (cmt_test)
+   run_cmt_test(has_ben, benchmark_cmd, cpu_no);
 
-   if (cat_test) {
-   ksft_print_msg("Starting CAT test ...\n");
-   res = cat_perf_miss_val(cpu_no, no_of_bits, "L3");
-   ksft_test_result(!res, "CAT: test\n");
-   cat_test_cleanup();
-   }
+   if (cat_test)
+   run_cat_test(cpu_no, no_of_bits);
 
return ksft_exit_pass();
 }
-- 
2.31.0

[PATCH v6 20/21] selftests/resctrl: Fix checking for < 0 for unsigned values

2021-03-16 Thread Fenghua Yu

Dan reported following static checker warnings

tools/testing/selftests/resctrl/resctrl_val.c:545 measure_vals()
warn: 'bw_imc' unsigned <= 0

tools/testing/selftests/resctrl/resctrl_val.c:549 measure_vals()
warn: 'bw_resc_end' unsigned <= 0

These warnings are reported because
1. measure_vals() declares 'bw_imc' and 'bw_resc_end' as unsigned long
   variables
2. Return value of get_mem_bw_imc() and get_mem_bw_resctrl() are assigned
   to 'bw_imc' and 'bw_resc_end' respectively
3. The returned values are checked for <= 0 to see if the calls failed

Checking for < 0 for an unsigned value doesn't make any sense.

Fix this issue by changing the implementation of get_mem_bw_imc() and
get_mem_bw_resctrl() such that they now accept reference to a variable
and set the variable appropriately upon success and return 0, else return
< 0 on error.

Reported-by: Dan Carpenter 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/resctrl_val.c | 41 +++
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_val.c 
b/tools/testing/selftests/resctrl/resctrl_val.c
index 20d457c47ded..95224345c78e 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -300,9 +300,9 @@ static int initialize_mem_bw_imc(void)
  * Memory B/W utilized by a process on a socket can be calculated using
  * iMC counters. Perf events are used to read these counters.
  *
- * Return: >= 0 on success. < 0 on failure.
+ * Return: = 0 on success. < 0 on failure.
  */
-static float get_mem_bw_imc(int cpu_no, char *bw_report)
+static int get_mem_bw_imc(int cpu_no, char *bw_report, float *bw_imc)
 {
float reads, writes, of_mul_read, of_mul_write;
int imc, j, ret;
@@ -373,13 +373,18 @@ static float get_mem_bw_imc(int cpu_no, char *bw_report)
close(imc_counters_config[imc][WRITE].fd);
}
 
-   if (strcmp(bw_report, "reads") == 0)
-   return reads;
+   if (strcmp(bw_report, "reads") == 0) {
+   *bw_imc = reads;
+   return 0;
+   }
 
-   if (strcmp(bw_report, "writes") == 0)
-   return writes;
+   if (strcmp(bw_report, "writes") == 0) {
+   *bw_imc = writes;
+   return 0;
+   }
 
-   return (reads + writes);
+   *bw_imc = reads + writes;
+   return 0;
 }
 
 void set_mbm_path(const char *ctrlgrp, const char *mongrp, int resource_id)
@@ -438,9 +443,8 @@ static void initialize_mem_bw_resctrl(const char *ctrlgrp, 
const char *mongrp,
  * 1. If con_mon grp is given, then read from it
  * 2. If con_mon grp is not given, then read from root con_mon grp
  */
-static unsigned long get_mem_bw_resctrl(void)
+static int get_mem_bw_resctrl(unsigned long *mbm_total)
 {
-   unsigned long mbm_total = 0;
FILE *fp;
 
fp = fopen(mbm_total_path, "r");
@@ -449,7 +453,7 @@ static unsigned long get_mem_bw_resctrl(void)
 
return -1;
}
-   if (fscanf(fp, "%lu", _total) <= 0) {
+   if (fscanf(fp, "%lu", mbm_total) <= 0) {
perror("Could not get mbm local bytes");
fclose(fp);
 
@@ -457,7 +461,7 @@ static unsigned long get_mem_bw_resctrl(void)
}
fclose(fp);
 
-   return mbm_total;
+   return 0;
 }
 
 pid_t bm_pid, ppid;
@@ -549,7 +553,8 @@ static void initialize_llc_occu_resctrl(const char 
*ctrlgrp, const char *mongrp,
 static int
 measure_vals(struct resctrl_val_param *param, unsigned long *bw_resc_start)
 {
-   unsigned long bw_imc, bw_resc, bw_resc_end;
+   unsigned long bw_resc, bw_resc_end;
+   float bw_imc;
int ret;
 
/*
@@ -559,13 +564,13 @@ measure_vals(struct resctrl_val_param *param, unsigned 
long *bw_resc_start)
 * Compare the two values to validate resctrl value.
 * It takes 1sec to measure the data.
 */
-   bw_imc = get_mem_bw_imc(param->cpu_no, param->bw_report);
-   if (bw_imc <= 0)
-   return bw_imc;
+   ret = get_mem_bw_imc(param->cpu_no, param->bw_report, _imc);
+   if (ret < 0)
+   return ret;
 
-   bw_resc_end = get_mem_bw_resctrl();
-   if (bw_resc_end <= 0)
-   return bw_resc_end;
+   ret = get_mem_bw_resctrl(_resc_end);
+   if (ret < 0)
+   return ret;
 
bw_resc = (bw_resc_end - *bw_resc_start) / MB;
ret = print_results_bw(param->filename, bm_pid, bw_imc, bw_resc);
-- 
2.31.0

[PATCH v6 15/21] selftests/resctrl: Don't hard code value of "no_of_bits" variable

2021-03-16 Thread Fenghua Yu

Cache related tests (like CAT and CMT) depend on a variable called
no_of_bits to run. no_of_bits defines the number of contiguous bits
that should be set in the CBM mask and a user can pass a value for
no_of_bits using -n command line argument. If a user hasn't passed any
value, it defaults to 5 (randomly chosen value).

Hard coding no_of_bits to 5 will make the cache tests fail to run on
systems that support maximum cbm mask that is less than or equal to 5 bits.
Hence, don't hard code no_of_bits value.

If a user passes a value for "no_of_bits" using -n option, use it.
Otherwise, no_of_bits is equal to half of the maximum number of bits in
the cbm mask.

Please note that CMT test is still hard coded to 5 bits. It will change in
subsequent patches that change CMT test.

Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/cat_test.c  | 5 -
 tools/testing/selftests/resctrl/resctrl_tests.c | 8 ++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c 
b/tools/testing/selftests/resctrl/cat_test.c
index 090d3afc7a78..04d706b4f10e 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -130,7 +130,10 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
/* Get max number of bits from default-cabm mask */
count_of_bits = count_bits(long_mask);
 
-   if (n < 1 || n > count_of_bits - 1) {
+   if (!n)
+   n = count_of_bits / 2;
+
+   if (n > count_of_bits - 1) {
ksft_print_msg("Invalid input value for no_of_bits n!\n");
ksft_print_msg("Please enter value in range 1 to %d\n",
   count_of_bits - 1);
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c 
b/tools/testing/selftests/resctrl/resctrl_tests.c
index 355bd28b996a..2ace464b96d1 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -57,7 +57,7 @@ void tests_cleanup(void)
 int main(int argc, char **argv)
 {
bool has_ben = false, mbm_test = true, mba_test = true, cmt_test = true;
-   int res, c, cpu_no = 1, span = 250, argc_new = argc, i, no_of_bits = 5;
+   int res, c, cpu_no = 1, span = 250, argc_new = argc, i, no_of_bits = 0;
char *benchmark_cmd[BENCHMARK_ARGS], bw_report[64], bm_type[64];
char benchmark_cmd_area[BENCHMARK_ARGS][BENCHMARK_ARG_SIZE];
int ben_ind, ben_count, tests = 0;
@@ -110,6 +110,10 @@ int main(int argc, char **argv)
break;
case 'n':
no_of_bits = atoi(optarg);
+   if (no_of_bits <= 0) {
+   printf("Bail out! invalid argument for 
no_of_bits\n");
+   return -1;
+   }
break;
case 'h':
cmd_help();
@@ -188,7 +192,7 @@ int main(int argc, char **argv)
ksft_print_msg("Starting CMT test ...\n");
if (!has_ben)
sprintf(benchmark_cmd[5], "%s", CMT_STR);
-   res = cmt_resctrl_val(cpu_no, no_of_bits, benchmark_cmd);
+   res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
ksft_test_result(!res, "CMT: test\n");
cmt_test_cleanup();
}
-- 
2.31.0

[PATCH v6 14/21] selftests/resctrl: Fix MBA/MBM results reporting format

2021-03-16 Thread Fenghua Yu

MBM unit test starts fill_buf (default built-in benchmark) in a new con_mon
group (c1, m1) and records resctrl reported mbm values and iMC (Integrated
Memory Controller) values every second. It does this for five seconds
(randomly chosen value) in total. It then calculates average of resctrl_mbm
values and imc_mbm values and if the difference is greater than 300 MB/sec
(randomly chosen value), the test treats it as a failure. MBA unit test is
similar to MBM but after every run it changes schemata.

Checking for a difference of 300 MB/sec doesn't look very meaningful when
the mbm values are changing over a wide range. For example, below are the
values running MBA test on SKL with different allocations

1. With 10% as schemata both iMC and resctrl mbm_values are around 2000
   MB/sec
2. With 100% as schemata both iMC and resctrl mbm_values are around 1
   MB/sec

A 300 MB/sec difference between resctrl_mbm and imc_mbm values is
acceptable at 100% schemata but it isn't acceptable at 10% schemata because
that's a huge difference.

So, fix this by checking for percentage difference instead of absolute
difference i.e. check if the difference between resctrl_mbm value and
imc_mbm value is within 5% (randomly chosen value) of imc_mbm value. If the
difference is greater than 5% of imc_mbm value, treat it is a failure.

Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/mba_test.c | 22 +-
 tools/testing/selftests/resctrl/mbm_test.c | 15 ---
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c 
b/tools/testing/selftests/resctrl/mba_test.c
index f42d4ba70363..8842d379e886 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -12,7 +12,7 @@
 
 #define RESULT_FILE_NAME   "result_mba"
 #define NUM_OF_RUNS5
-#define MAX_DIFF   300
+#define MAX_DIFF_PERCENT   5
 #define ALLOCATION_MAX 100
 #define ALLOCATION_MIN 10
 #define ALLOCATION_STEP10
@@ -62,7 +62,8 @@ static void show_mba_info(unsigned long *bw_imc, unsigned 
long *bw_resc)
 allocation++) {
unsigned long avg_bw_imc, avg_bw_resc;
unsigned long sum_bw_imc = 0, sum_bw_resc = 0;
-   unsigned long avg_diff;
+   int avg_diff_per;
+   float avg_diff;
 
/*
 * The first run is discarded due to inaccurate value from
@@ -76,16 +77,19 @@ static void show_mba_info(unsigned long *bw_imc, unsigned 
long *bw_resc)
 
avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
-   avg_diff = labs((long)(avg_bw_resc - avg_bw_imc));
+   avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
+   avg_diff_per = (int)(avg_diff * 100);
 
-   ksft_print_msg("%s MBA schemata percentage %u smaller than %d 
%%\n",
-  avg_diff > MAX_DIFF ? "Fail:" : "Pass:",
-  ALLOCATION_MAX - ALLOCATION_STEP * allocation,
-  MAX_DIFF);
-   ksft_print_msg("avg_diff: %lu\n", avg_diff);
+   ksft_print_msg("%s MBA: diff within %d%% for schemata %u\n",
+  avg_diff_per > MAX_DIFF_PERCENT ?
+  "Fail:" : "Pass:",
+  MAX_DIFF_PERCENT,
+  ALLOCATION_MAX - ALLOCATION_STEP * allocation);
+
+   ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);
ksft_print_msg("avg_bw_resc: %lu\n", avg_bw_resc);
-   if (avg_diff > MAX_DIFF)
+   if (avg_diff_per > MAX_DIFF_PERCENT)
failed = true;
}
 
diff --git a/tools/testing/selftests/resctrl/mbm_test.c 
b/tools/testing/selftests/resctrl/mbm_test.c
index 0d65ba4b62b4..651d4ac15986 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -11,7 +11,7 @@
 #include "resctrl.h"
 
 #define RESULT_FILE_NAME   "result_mbm"
-#define MAX_DIFF   300
+#define MAX_DIFF_PERCENT   5
 #define NUM_OF_RUNS5
 
 static int
@@ -19,8 +19,8 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, 
int span)
 {
unsigned long avg_bw_imc = 0, avg_bw_resc = 0;
unsigned long sum_bw_imc = 0, sum_bw_resc = 0;
-   long avg_diff = 0;
-   int runs, ret;
+   int runs, ret, avg_diff_per;
+   float avg_diff = 0;
 
/*
 * Discard the first value which is inaccurate due to monitoring setup
@@ -33,12 +33,13 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, 
int span)
 
avg_bw_imc = sum_bw_imc / 4;
avg_bw_resc = sum_bw_resc / 4;
-

[PATCH v6 17/21] selftests/resctrl: Skip the test if requested resctrl feature is not supported

2021-03-16 Thread Fenghua Yu

There could be two reasons why a resctrl feature might not be enabled on
the platform
1. H/W might not support the feature
2. Even if the H/W supports it, the user might have disabled the feature
   through kernel command line arguments

Hence, any resctrl unit test (like cmt, cat, mbm and mba) before starting
the test will first check if the feature is enabled on the platform or not.
If the feature isn't enabled, then the test returns with an error status.
For example, if MBA isn't supported on a platform and if the user tries to
run MBA, the output will look like this

ok mounting resctrl to "/sys/fs/resctrl"
not ok MBA: schemata change

But, not supporting a feature isn't a test failure. So, instead of treating
it as an error, use the SKIP directive of the TAP protocol. With the
change, the output will look as below

ok MBA # SKIP Hardware does not support MBA or MBA is disabled

Suggested-by: Reinette Chatre 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v6:
- Replace "cat" by CAT_STR and so on (Babu).

 tools/testing/selftests/resctrl/cat_test.c|  3 ---
 tools/testing/selftests/resctrl/mba_test.c|  3 ---
 tools/testing/selftests/resctrl/mbm_test.c|  3 ---
 .../testing/selftests/resctrl/resctrl_tests.c | 23 +++
 4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c 
b/tools/testing/selftests/resctrl/cat_test.c
index 04d706b4f10e..cd4f68388e0f 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -111,9 +111,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
if (ret)
return ret;
 
-   if (!validate_resctrl_feature_request("cat"))
-   return -1;
-
/* Get default cbm mask for L3/L2 cache */
ret = get_cbm_mask(cache_type, cbm_mask);
if (ret)
diff --git a/tools/testing/selftests/resctrl/mba_test.c 
b/tools/testing/selftests/resctrl/mba_test.c
index 8842d379e886..26f12ad4c663 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -158,9 +158,6 @@ int mba_schemata_change(int cpu_no, char *bw_report, char 
**benchmark_cmd)
 
remove(RESULT_FILE_NAME);
 
-   if (!validate_resctrl_feature_request("mba"))
-   return -1;
-
ret = resctrl_val(benchmark_cmd, );
if (ret)
return ret;
diff --git a/tools/testing/selftests/resctrl/mbm_test.c 
b/tools/testing/selftests/resctrl/mbm_test.c
index 651d4ac15986..02b1ed03f1e5 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -131,9 +131,6 @@ int mbm_bw_change(int span, int cpu_no, char *bw_report, 
char **benchmark_cmd)
 
remove(RESULT_FILE_NAME);
 
-   if (!validate_resctrl_feature_request("mbm"))
-   return -1;
-
ret = resctrl_val(benchmark_cmd, );
if (ret)
return ret;
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c 
b/tools/testing/selftests/resctrl/resctrl_tests.c
index e63e0d8764ef..fb246bc41f47 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -60,6 +60,12 @@ static void run_mbm_test(bool has_ben, char **benchmark_cmd, 
int span,
int res;
 
ksft_print_msg("Starting MBM BW change ...\n");
+
+   if (!validate_resctrl_feature_request(MBM_STR)) {
+   ksft_test_result_skip("Hardware does not support MBM or MBM is 
disabled\n");
+   return;
+   }
+
if (!has_ben)
sprintf(benchmark_cmd[5], "%s", MBA_STR);
res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd);
@@ -73,6 +79,12 @@ static void run_mba_test(bool has_ben, char **benchmark_cmd, 
int span,
int res;
 
ksft_print_msg("Starting MBA Schemata change ...\n");
+
+   if (!validate_resctrl_feature_request(MBA_STR)) {
+   ksft_test_result_skip("Hardware does not support MBA or MBA is 
disabled\n");
+   return;
+   }
+
if (!has_ben)
sprintf(benchmark_cmd[1], "%d", span);
res = mba_schemata_change(cpu_no, bw_report, benchmark_cmd);
@@ -85,6 +97,11 @@ static void run_cmt_test(bool has_ben, char **benchmark_cmd, 
int cpu_no)
int res;
 
ksft_print_msg("Starting CMT test ...\n");
+   if (!validate_resctrl_feature_request(CMT_STR)) {
+   ksft_test_result_skip("Hardware does not support CMT or CMT is 
disabled\n");
+   return;
+   }
+
if (!has_ben)
sprintf(benchmark_cmd[5], "%s", CMT_STR);
res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
@@ -97,6 +114,12 @@ static void run_cat_test(int cpu_no, int no_of_bits)
int res;
 
ksft_print_msg("Starting CAT test ...\n");
+
+   if (!validate_resctrl_feature_request(CAT_STR)) {
+

[PATCH v6 03/21] selftests/resctrl: Fix compilation issues for other global variables

2021-03-16 Thread Fenghua Yu

Reinette reported following compilation issue on Fedora 32, gcc version
10.1.1

/usr/bin/ld: resctrl_tests.o:/resctrl.h:65: multiple definition
of `bm_pid'; cache.o:/resctrl.h:65: first defined here

Other variables are ppid, tests_run, llc_occup_path, is_amd. Compiler
isn't happy because these variables are defined globally in two .c files
but are not declared as extern.

To fix issues for the global variables, declare them as extern.

Chang Log:
- Split this patch from v4's patch 1 (Shuah).

Reported-by: Reinette Chatre 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/resctrl.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h 
b/tools/testing/selftests/resctrl/resctrl.h
index 959c71e39bdc..12b77182cb44 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -62,11 +62,11 @@ struct resctrl_val_param {
int (*setup)(int num, ...);
 };
 
-pid_t bm_pid, ppid;
-int tests_run;
+extern pid_t bm_pid, ppid;
+extern int tests_run;
 
-char llc_occup_path[1024];
-bool is_amd;
+extern char llc_occup_path[1024];
+extern bool is_amd;
 
 bool check_resctrlfs_support(void);
 int filter_dmesg(void);
-- 
2.31.0

[PATCH v6 19/21] selftests/resctrl: Fix incorrect parsing of iMC counters

2021-03-16 Thread Fenghua Yu

iMC (Integrated Memory Controller) counters are usually at
"/sys/bus/event_source/devices/" and are named as "uncore_imc_".
num_of_imcs() function tries to count number of such iMC counters so that
it could appropriately initialize required number of perf_attr structures
that could be used to read these iMC counters.

num_of_imcs() function assumes that all the directories under this path
that start with "uncore_imc" are iMC counters. But, on some systems there
could be directories named as "uncore_imc_free_running" which aren't iMC
counters. Trying to read from such directories will result in "not found
file" errors and MBM/MBA tests will fail.

Hence, fix the logic in num_of_imcs() such that it looks at the first
character after "uncore_imc_" to check if it's a numerical digit or not. If
it's a digit then the directory represents an iMC counter, else, skip the
directory.

Reported-by: Reinette Chatre 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/resctrl_val.c | 22 +--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_val.c 
b/tools/testing/selftests/resctrl/resctrl_val.c
index 5dfae51133bc..20d457c47ded 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -221,8 +221,8 @@ static int read_from_imc_dir(char *imc_dir, int count)
  */
 static int num_of_imcs(void)
 {
+   char imc_dir[512], *temp;
unsigned int count = 0;
-   char imc_dir[512];
struct dirent *ep;
int ret;
DIR *dp;
@@ -230,7 +230,25 @@ static int num_of_imcs(void)
dp = opendir(DYN_PMU_PATH);
if (dp) {
while ((ep = readdir(dp))) {
-   if (strstr(ep->d_name, UNCORE_IMC)) {
+   temp = strstr(ep->d_name, UNCORE_IMC);
+   if (!temp)
+   continue;
+
+   /*
+* imc counters are named as "uncore_imc_", hence
+* increment the pointer to point to . Note that
+* sizeof(UNCORE_IMC) would count for null character as
+* well and hence the last underscore character in
+* uncore_imc'_' need not be counted.
+*/
+   temp = temp + sizeof(UNCORE_IMC);
+
+   /*
+* Some directories under "DYN_PMU_PATH" could have
+* names like "uncore_imc_free_running", hence, check if
+* first character is a numerical digit or not.
+*/
+   if (temp[0] >= '0' && temp[0] <= '9') {
sprintf(imc_dir, "%s/%s/", DYN_PMU_PATH,
ep->d_name);
ret = read_from_imc_dir(imc_dir, count);
-- 
2.31.0

[PATCH v6 18/21] selftests/resctrl: Fix unmount resctrl FS

2021-03-16 Thread Fenghua Yu

umount_resctrlfs() directly attempts to unmount resctrl file system without
checking if resctrl FS is already mounted or not. It returns 0 on success
and on failure it prints an error message and returns an error status.
Calling umount_resctrlfs() when resctrl FS isn't mounted will return an
error status.

There could be situations where-in the caller might not know if resctrl
FS is already mounted or not and the caller might still want to unmount
resctrl FS if it's already mounted (For example during teardown).

To support above use cases, change umount_resctrlfs() such that it now
first checks if resctrl FS is already mounted or not and unmounts resctrl
FS only if it's already mounted.

unmount resctrl FS upon exit. For example, running only mba test on a
Broadwell (BDW) machine (MBA isn't supported on BDW CPU).

This happens because validate_resctrl_feature_request() would mount resctrl
FS to check if mba is enabled on the platform or not and finds that the H/W
doesn't support mba and hence will return false to run_mba_test(). This in
turn makes the main() function return without unmounting resctrl FS.

Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++
 tools/testing/selftests/resctrl/resctrlfs.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c 
b/tools/testing/selftests/resctrl/resctrl_tests.c
index fb246bc41f47..f51b5fc066a3 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -253,5 +253,7 @@ int main(int argc, char **argv)
if (cat_test)
run_cat_test(cpu_no, no_of_bits);
 
+   umount_resctrlfs();
+
return ksft_exit_pass();
 }
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c 
b/tools/testing/selftests/resctrl/resctrlfs.c
index 26563175acf6..ade5f2b8b843 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -82,6 +82,9 @@ int remount_resctrlfs(bool mum_resctrlfs)
 
 int umount_resctrlfs(void)
 {
+   if (find_resctrl_mount(NULL))
+   return 0;
+
if (umount(RESCTRL_PATH)) {
perror("# Unable to umount resctrl");
 
-- 
2.31.0

[PATCH v6 02/21] selftests/resctrl: Fix compilation issues for global variables

2021-03-16 Thread Fenghua Yu

Reinette reported following compilation issue on Fedora 32, gcc version
10.1.1

/usr/bin/ld: cqm_test.o:/cqm_test.c:22: multiple definition of
`cache_size'; cat_test.o:/cat_test.c:23: first defined here

The same issue is reported for long_mask, cbm_mask, count_of_bits etc
variables as well. Compiler isn't happy because these variables are
defined globally in two .c files namely cqm_test.c and cat_test.c and
the compiler during compilation finds that the variable is already
defined (multiple definition error).

Taking a closer look at the usage of these variables reveals that these
variables are used only locally in functions such as cqm_resctrl_val()
(defined in cqm_test.c) and cat_perf_miss_val() (defined in cat_test.c).
These variables are not shared between those functions. So, there is no
need for these variables to be global. Hence, fix this issue by making
them static variables.

Reported-by: Reinette Chatre 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Define long_mask, cbm_mask, count_of_bits etc as static variables
  (Shuah).
- Split this patch into patch 2 and 3 (Shuah).

 tools/testing/selftests/resctrl/cat_test.c  | 10 +-
 tools/testing/selftests/resctrl/cqm_test.c  | 10 +-
 tools/testing/selftests/resctrl/resctrl.h   |  2 +-
 tools/testing/selftests/resctrl/resctrlfs.c | 10 +-
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c 
b/tools/testing/selftests/resctrl/cat_test.c
index 5da43767b973..bdeeb5772592 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -17,10 +17,10 @@
 #define MAX_DIFF_PERCENT   4
 #define MAX_DIFF   100
 
-int count_of_bits;
-char cbm_mask[256];
-unsigned long long_mask;
-unsigned long cache_size;
+static int count_of_bits;
+static char cbm_mask[256];
+static unsigned long long_mask;
+static unsigned long cache_size;
 
 /*
  * Change schemata. Write schemata to specified
@@ -136,7 +136,7 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
return -1;
 
/* Get default cbm mask for L3/L2 cache */
-   ret = get_cbm_mask(cache_type);
+   ret = get_cbm_mask(cache_type, cbm_mask);
if (ret)
return ret;
 
diff --git a/tools/testing/selftests/resctrl/cqm_test.c 
b/tools/testing/selftests/resctrl/cqm_test.c
index 5e7308ac63be..de33d1c0466e 100644
--- a/tools/testing/selftests/resctrl/cqm_test.c
+++ b/tools/testing/selftests/resctrl/cqm_test.c
@@ -16,10 +16,10 @@
 #define MAX_DIFF   200
 #define MAX_DIFF_PERCENT   15
 
-int count_of_bits;
-char cbm_mask[256];
-unsigned long long_mask;
-unsigned long cache_size;
+static int count_of_bits;
+static char cbm_mask[256];
+static unsigned long long_mask;
+static unsigned long cache_size;
 
 static int cqm_setup(int num, ...)
 {
@@ -125,7 +125,7 @@ int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd)
if (!validate_resctrl_feature_request("cqm"))
return -1;
 
-   ret = get_cbm_mask("L3");
+   ret = get_cbm_mask("L3", cbm_mask);
if (ret)
return ret;
 
diff --git a/tools/testing/selftests/resctrl/resctrl.h 
b/tools/testing/selftests/resctrl/resctrl.h
index 39bf59c6b9c5..959c71e39bdc 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -92,7 +92,7 @@ void tests_cleanup(void);
 void mbm_test_cleanup(void);
 int mba_schemata_change(int cpu_no, char *bw_report, char **benchmark_cmd);
 void mba_test_cleanup(void);
-int get_cbm_mask(char *cache_type);
+int get_cbm_mask(char *cache_type, char *cbm_mask);
 int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size);
 void ctrlc_handler(int signum, siginfo_t *info, void *ptr);
 int cat_val(struct resctrl_val_param *param);
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c 
b/tools/testing/selftests/resctrl/resctrlfs.c
index 19c0ec4045a4..2a16100c9c3f 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -49,8 +49,6 @@ static int find_resctrl_mount(char *buffer)
return -ENOENT;
 }
 
-char cbm_mask[256];
-
 /*
  * remount_resctrlfs - Remount resctrl FS at /sys/fs/resctrl
  * @mum_resctrlfs: Should the resctrl FS be remounted?
@@ -205,16 +203,18 @@ int get_cache_size(int cpu_no, char *cache_type, unsigned 
long *cache_size)
 /*
  * get_cbm_mask - Get cbm mask for given cache
  * @cache_type:Cache level L2/L3
- *
- * Mask is stored in cbm_mask which is global variable.
+ * @cbm_mask:  cbm_mask returned as a string
  *
  * Return: = 0 on success, < 0 on failure.
  */
-int get_cbm_mask(char *cache_type)
+int get_cbm_mask(char *cache_type, char *cbm_mask)
 {
char cbm_mask_path[1024];
FILE *fp;
 
+   if (!cbm_mask)
+   return -1;
+
sprintf(cbm_mask_path, "%s/%s/cbm_mask", CBM_MASK_PATH,

[PATCH v6 05/21] selftests/resctrl: Ensure sibling CPU is not same as original CPU

2021-03-16 Thread Fenghua Yu

From: Reinette Chatre 

The resctrl tests can accept a CPU on which the tests are run and use
default of CPU #1 if it is not provided. In the CAT test a "sibling CPU"
is determined that is from the same package where another thread will be
run.

The current algorithm with which a "sibling CPU" is determined does not
take the provided/default CPU into account and when that CPU is the
first CPU in a package then the "sibling CPU" will be selected to be the
same CPU since it starts by picking the first CPU from core_siblings_list.

Fix the "sibling CPU" selection by taking the provided/default CPU into
account and ensuring a sibling that is a different CPU is selected.

Tested-by: Babu Moger 
Signed-off-by: Reinette Chatre 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Move from v4's patch 8 to this patch as the fix patch should be first
  (Shuah).

 tools/testing/selftests/resctrl/resctrlfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c 
b/tools/testing/selftests/resctrl/resctrlfs.c
index 4174e48e06d1..bc52076bee7f 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -268,7 +268,7 @@ int get_core_sibling(int cpu_no)
while (token) {
sibling_cpu_no = atoi(token);
/* Skipping core 0 as we don't want to run test on core 0 */
-   if (sibling_cpu_no != 0)
+   if (sibling_cpu_no != 0 && sibling_cpu_no != cpu_no)
break;
token = strtok(NULL, "-,");
}
-- 
2.31.0

[PATCH v6 06/21] selftests/resctrl: Fix missing options "-n" and "-p"

2021-03-16 Thread Fenghua Yu

resctrl test suite accepts command line arguments (like -b, -t, -n and -p)
as documented in the help. But passing -n and -p throws an invalid option
error. This happens because -n and -p are missing in the list of
characters that getopt() recognizes as valid arguments. Hence, they are
treated as invalid options.

Fix this by adding them to the list of characters that getopt() recognizes
as valid arguments. Please note that the main() function already has the
logic to deal with the values passed as part of these arguments and hence
no changes are needed there.

Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Move from v4's patch 9 to this patch as the fix patch should be first
  (Shuah).

 tools/testing/selftests/resctrl/resctrl_tests.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c 
b/tools/testing/selftests/resctrl/resctrl_tests.c
index 4b109a59f72d..ac2269610aa9 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -73,7 +73,7 @@ int main(int argc, char **argv)
}
}
 
-   while ((c = getopt(argc_new, argv, "ht:b:")) != -1) {
+   while ((c = getopt(argc_new, argv, "ht:b:n:p:")) != -1) {
char *token;
 
switch (c) {
-- 
2.31.0

[PATCH v6 04/21] selftests/resctrl: Clean up resctrl features check

2021-03-16 Thread Fenghua Yu

Checking resctrl features call strcmp() to compare feature strings
(e.g. "mba", "cat" etc). The checkings are error prone and don't have
good coding style. Define the constant strings in macros and call
strncmp() to solve the potential issues.

Suggested-by: Shuah Khan 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Remove is_cat() etc functions and directly call strncmp() to check
  the features (Shuah).

 tools/testing/selftests/resctrl/cache.c   |  8 +++
 tools/testing/selftests/resctrl/cat_test.c|  2 +-
 tools/testing/selftests/resctrl/cqm_test.c|  2 +-
 tools/testing/selftests/resctrl/fill_buf.c|  4 ++--
 tools/testing/selftests/resctrl/mba_test.c|  2 +-
 tools/testing/selftests/resctrl/mbm_test.c|  2 +-
 tools/testing/selftests/resctrl/resctrl.h |  5 +
 .../testing/selftests/resctrl/resctrl_tests.c | 12 +-
 tools/testing/selftests/resctrl/resctrl_val.c | 22 +--
 tools/testing/selftests/resctrl/resctrlfs.c   | 17 +++---
 10 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c 
b/tools/testing/selftests/resctrl/cache.c
index 38dbf4962e33..5922cc1b0386 100644
--- a/tools/testing/selftests/resctrl/cache.c
+++ b/tools/testing/selftests/resctrl/cache.c
@@ -182,7 +182,7 @@ int measure_cache_vals(struct resctrl_val_param *param, int 
bm_pid)
/*
 * Measure cache miss from perf.
 */
-   if (!strcmp(param->resctrl_val, "cat")) {
+   if (!strncmp(param->resctrl_val, CAT_STR, sizeof(CAT_STR))) {
ret = get_llc_perf(_perf_miss);
if (ret < 0)
return ret;
@@ -192,7 +192,7 @@ int measure_cache_vals(struct resctrl_val_param *param, int 
bm_pid)
/*
 * Measure llc occupancy from resctrl.
 */
-   if (!strcmp(param->resctrl_val, "cqm")) {
+   if (!strncmp(param->resctrl_val, CQM_STR, sizeof(CQM_STR))) {
ret = get_llc_occu_resctrl(_occu_resc);
if (ret < 0)
return ret;
@@ -234,7 +234,7 @@ int cat_val(struct resctrl_val_param *param)
if (ret)
return ret;
 
-   if ((strcmp(resctrl_val, "cat") == 0)) {
+   if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
ret = initialize_llc_perf();
if (ret)
return ret;
@@ -242,7 +242,7 @@ int cat_val(struct resctrl_val_param *param)
 
/* Test runs until the callback setup() tells the test to stop. */
while (1) {
-   if (strcmp(resctrl_val, "cat") == 0) {
+   if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
ret = param->setup(1, param);
if (ret) {
ret = 0;
diff --git a/tools/testing/selftests/resctrl/cat_test.c 
b/tools/testing/selftests/resctrl/cat_test.c
index bdeeb5772592..20823725daca 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -164,7 +164,7 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
return -1;
 
struct resctrl_val_param param = {
-   .resctrl_val= "cat",
+   .resctrl_val= CAT_STR,
.cpu_no = cpu_no,
.mum_resctrlfs  = 0,
.setup  = cat_setup,
diff --git a/tools/testing/selftests/resctrl/cqm_test.c 
b/tools/testing/selftests/resctrl/cqm_test.c
index de33d1c0466e..271752e9ef5b 100644
--- a/tools/testing/selftests/resctrl/cqm_test.c
+++ b/tools/testing/selftests/resctrl/cqm_test.c
@@ -145,7 +145,7 @@ int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd)
}
 
struct resctrl_val_param param = {
-   .resctrl_val= "cqm",
+   .resctrl_val= CQM_STR,
.ctrlgrp= "c1",
.mongrp = "m1",
.cpu_no = cpu_no,
diff --git a/tools/testing/selftests/resctrl/fill_buf.c 
b/tools/testing/selftests/resctrl/fill_buf.c
index 79c611c99a3d..51e5cf22632f 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -115,7 +115,7 @@ static int fill_cache_read(unsigned char *start_ptr, 
unsigned char *end_ptr,
 
while (1) {
ret = fill_one_span_read(start_ptr, end_ptr);
-   if (!strcmp(resctrl_val, "cat"))
+   if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR)))
break;
}
 
@@ -134,7 +134,7 @@ static int fill_cache_write(unsigned char *start_ptr, 
unsigned char *end_ptr,
 {
while (1) {
fill_one_span_write(start_ptr, end_ptr);
-   if (!strcmp(resctrl_val, "cat"))
+   if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR)))
break;
}
 
diff --git

[PATCH v6 11/21] selftests/resctrl: Add config dependencies

2021-03-16 Thread Fenghua Yu

Add the config file for test dependencies.

Suggested-by: Shuah Khan 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Add this patch (Shuah)

 tools/testing/selftests/resctrl/config | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 tools/testing/selftests/resctrl/config

diff --git a/tools/testing/selftests/resctrl/config 
b/tools/testing/selftests/resctrl/config
new file mode 100644
index ..8d9f2deb56ed
--- /dev/null
+++ b/tools/testing/selftests/resctrl/config
@@ -0,0 +1,2 @@
+CONFIG_X86_CPU_RESCTRL=y
+CONFIG_PROC_CPU_RESCTRL=y
-- 
2.31.0

[PATCH v6 07/21] selftests/resctrl: Rename CQM test as CMT test

2021-03-16 Thread Fenghua Yu

CMT (Cache Monitoring Technology) [1] is a H/W feature that reports cache
occupancy of a process. resctrl selftest suite has a unit test to test CMT
for LLC but the test is named as CQM (Cache Quality Monitoring).
Furthermore, the unit test source file is named as cqm_test.c and several
functions, variables, comments, preprocessors and statements widely use
"cqm" as either suffix or prefix. This rampant misusage of CQM for CMT
might confuse someone who is newly looking at resctrl selftests because
this feature is named CMT in the Intel Software Developer's Manual.

Hence, rename all the occurrences (unit test source file name, functions,
variables, comments and preprocessors) of cqm with cmt.

[1] Please see Intel SDM, Volume 3, chapter 17 and section 18 for more
information on CMT: 
https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html

Suggested-by: Reinette Chatre 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/README|  4 +--
 tools/testing/selftests/resctrl/cache.c   |  4 +--
 .../resctrl/{cqm_test.c => cmt_test.c}| 20 +++---
 tools/testing/selftests/resctrl/resctrl.h |  6 ++---
 .../testing/selftests/resctrl/resctrl_tests.c | 26 +--
 tools/testing/selftests/resctrl/resctrl_val.c | 12 -
 tools/testing/selftests/resctrl/resctrlfs.c   | 10 +++
 7 files changed, 41 insertions(+), 41 deletions(-)
 rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (89%)

diff --git a/tools/testing/selftests/resctrl/README 
b/tools/testing/selftests/resctrl/README
index 6e5a0ffa18e8..4b36b25b6ac0 100644
--- a/tools/testing/selftests/resctrl/README
+++ b/tools/testing/selftests/resctrl/README
@@ -46,8 +46,8 @@ ARGUMENTS
 Parameter '-h' shows usage information.
 
 usage: resctrl_tests [-h] [-b "benchmark_cmd [options]"] [-t test list] [-n 
no_of_bits]
--b benchmark_cmd [options]: run specified benchmark for MBM, MBA and 
CQM default benchmark is builtin fill_buf
--t test list: run tests specified in the test list, e.g. -t mbm, mba, 
cqm, cat
+-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and 
CMT default benchmark is builtin fill_buf
+-t test list: run tests specified in the test list, e.g. -t mbm, mba, 
cmt, cat
 -n no_of_bits: run cache tests using specified no of bits in cache bit 
mask
 -p cpu_no: specify CPU number to run the test. 1 is default
 -h: help
diff --git a/tools/testing/selftests/resctrl/cache.c 
b/tools/testing/selftests/resctrl/cache.c
index 5922cc1b0386..2aa1b5c7d9e1 100644
--- a/tools/testing/selftests/resctrl/cache.c
+++ b/tools/testing/selftests/resctrl/cache.c
@@ -111,7 +111,7 @@ static int get_llc_perf(unsigned long *llc_perf_miss)
 
 /*
  * Get LLC Occupancy as reported by RESCTRL FS
- * For CQM,
+ * For CMT,
  * 1. If con_mon grp and mon grp given, then read from mon grp in
  * con_mon grp
  * 2. If only con_mon grp given, then read from con_mon grp
@@ -192,7 +192,7 @@ int measure_cache_vals(struct resctrl_val_param *param, int 
bm_pid)
/*
 * Measure llc occupancy from resctrl.
 */
-   if (!strncmp(param->resctrl_val, CQM_STR, sizeof(CQM_STR))) {
+   if (!strncmp(param->resctrl_val, CMT_STR, sizeof(CMT_STR))) {
ret = get_llc_occu_resctrl(_occu_resc);
if (ret < 0)
return ret;
diff --git a/tools/testing/selftests/resctrl/cqm_test.c 
b/tools/testing/selftests/resctrl/cmt_test.c
similarity index 89%
rename from tools/testing/selftests/resctrl/cqm_test.c
rename to tools/testing/selftests/resctrl/cmt_test.c
index 271752e9ef5b..4b63838dda32 100644
--- a/tools/testing/selftests/resctrl/cqm_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Cache Monitoring Technology (CQM) test
+ * Cache Monitoring Technology (CMT) test
  *
  * Copyright (C) 2018 Intel Corporation
  *
@@ -11,7 +11,7 @@
 #include "resctrl.h"
 #include 
 
-#define RESULT_FILE_NAME   "result_cqm"
+#define RESULT_FILE_NAME   "result_cmt"
 #define NUM_OF_RUNS5
 #define MAX_DIFF   200
 #define MAX_DIFF_PERCENT   15
@@ -21,7 +21,7 @@ static char cbm_mask[256];
 static unsigned long long_mask;
 static unsigned long cache_size;
 
-static int cqm_setup(int num, ...)
+static int cmt_setup(int num, ...)
 {
struct resctrl_val_param *p;
va_list param;
@@ -58,7 +58,7 @@ static void show_cache_info(unsigned long sum_llc_occu_resc, 
int no_of_bits,
else
res = false;
 
-   printf("%sok CQM: diff within %d, %d\%%\n", res ? "" : "not",
+   printf("%sok CMT: diff within %d, %d\%%\n", res ? "" : "not",
   MAX_DIFF, (int)MAX_DIFF_PERCENT);
 
printf("# diff: %ld\n", avg_diff);
@@ -106,12 +106,12 @@ static int check_results(struct resctrl_val_param *param, 
int no_of_bits)
return 0;
 }

[PATCH v6 09/21] selftests/resctrl: Share show_cache_info() by CAT and CMT tests

2021-03-16 Thread Fenghua Yu

show_cache_info() functions are defined separately in CAT and CMT
tests. But the functions are same for the tests and unnecessary
to be defined separately. Share the function by the tests.

Suggested-by: Shuah Khan 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Add this patch (Shuah)

 tools/testing/selftests/resctrl/cache.c| 42 ++
 tools/testing/selftests/resctrl/cat_test.c | 28 ++-
 tools/testing/selftests/resctrl/cmt_test.c | 33 ++---
 tools/testing/selftests/resctrl/resctrl.h  |  4 +++
 4 files changed, 52 insertions(+), 55 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c 
b/tools/testing/selftests/resctrl/cache.c
index 2aa1b5c7d9e1..362e3a418caa 100644
--- a/tools/testing/selftests/resctrl/cache.c
+++ b/tools/testing/selftests/resctrl/cache.c
@@ -270,3 +270,45 @@ int cat_val(struct resctrl_val_param *param)
 
return ret;
 }
+
+/*
+ * show_cache_info:show cache test result information
+ * @sum_llc_val:   sum of LLC cache result data
+ * @no_of_bits:number of bits
+ * @cache_span:cache span in bytes for CMT or in lines for CAT
+ * @max_diff:  max difference
+ * @max_diff_percent:  max difference percentage
+ * @num_of_runs:   number of runs
+ * @platform:  show test information on this platform
+ * @cmt:   CMT test or CAT test
+ *
+ * Return: 0 on success. non-zero on failure.
+ */
+int show_cache_info(unsigned long sum_llc_val, int no_of_bits,
+   unsigned long cache_span, unsigned long max_diff,
+   unsigned long max_diff_percent, unsigned long num_of_runs,
+   bool platform, bool cmt)
+{
+   unsigned long avg_llc_val = 0;
+   float diff_percent;
+   long avg_diff = 0;
+   int ret;
+
+   avg_llc_val = sum_llc_val / (num_of_runs - 1);
+   avg_diff = (long)abs(cache_span - avg_llc_val);
+   diff_percent = ((float)cache_span - avg_llc_val) / cache_span * 100;
+
+   ret = platform && abs((int)diff_percent) > max_diff_percent &&
+ (cmt ? (abs(avg_diff) > max_diff) : true);
+
+   ksft_print_msg("%s cache miss rate within %d%%\n",
+  ret ? "Fail:" : "Pass:", max_diff_percent);
+
+   ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
+   ksft_print_msg("Number of bits: %d\n", no_of_bits);
+   ksft_print_msg("Average LLC val: %lu\n", avg_llc_val);
+   ksft_print_msg("Cache span (%s): %lu\n", cmt ? "bytes" : "lines",
+  cache_span);
+
+   return ret;
+}
diff --git a/tools/testing/selftests/resctrl/cat_test.c 
b/tools/testing/selftests/resctrl/cat_test.c
index 1daf911076c7..090d3afc7a78 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -52,30 +52,6 @@ static int cat_setup(int num, ...)
return ret;
 }
 
-static int show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
-  unsigned long span)
-{
-   unsigned long allocated_cache_lines = span / 64;
-   unsigned long avg_llc_perf_miss = 0;
-   float diff_percent;
-   int ret;
-
-   avg_llc_perf_miss = sum_llc_perf_miss / (NUM_OF_RUNS - 1);
-   diff_percent = ((float)allocated_cache_lines - avg_llc_perf_miss) /
-   allocated_cache_lines * 100;
-
-   ret = !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT;
-   ksft_print_msg("Cache miss rate %swithin %d%%\n",
-  ret ? "not " : "", MAX_DIFF_PERCENT);
-
-   ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
-   ksft_print_msg("Number of bits: %d\n", no_of_bits);
-   ksft_print_msg("Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);
-   ksft_print_msg("Allocated cache lines: %lu\n", allocated_cache_lines);
-
-   return ret;
-}
-
 static int check_results(struct resctrl_val_param *param)
 {
char *token_array[8], temp[512];
@@ -111,7 +87,9 @@ static int check_results(struct resctrl_val_param *param)
fclose(fp);
no_of_bits = count_bits(param->mask);
 
-   return show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);
+   return show_cache_info(sum_llc_perf_miss, no_of_bits, param->span / 64,
+  MAX_DIFF, MAX_DIFF_PERCENT, NUM_OF_RUNS,
+  !is_amd, false);
 }
 
 void cat_test_cleanup(void)
diff --git a/tools/testing/selftests/resctrl/cmt_test.c 
b/tools/testing/selftests/resctrl/cmt_test.c
index b1ab1bd1f74d..8968e36db99d 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -39,35 +39,6 @@ static int cmt_setup(int num, ...)
return 0;
 }
 
-static int show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits,
-  unsigned long span)
-{
-   unsigned long avg_llc_occu_resc = 0;
-   float

[PATCH v6 12/21] selftests/resctrl: Check for resctrl mount point only if resctrl FS is supported

2021-03-16 Thread Fenghua Yu

check_resctrlfs_support() does the following
1. Checks if the platform supports resctrl file system or not by looking
   for resctrl in /proc/filesystems
2. Calls opendir() on default resctrl file system path
   (i.e. /sys/fs/resctrl)
3. Checks if resctrl file system is mounted or not by looking at
   /proc/mounts

Steps 2 and 3 will fail if the platform does not support resctrl file
system. So, there is no need to check for them if step 1 fails.

Fix this by returning immediately if the platform does not support
resctrl file system.

Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
 tools/testing/selftests/resctrl/resctrlfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c 
b/tools/testing/selftests/resctrl/resctrlfs.c
index 6b22a186790a..87195eb78356 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -570,6 +570,9 @@ bool check_resctrlfs_support(void)
ksft_print_msg("%s kernel supports resctrl filesystem\n",
   ret ? "Pass:" : "Fail:");
 
+   if (!ret)
+   return ret;
+
dp = opendir(RESCTRL_PATH);
ksft_print_msg("%s resctrl mountpoint \"%s\" exists\n",
   dp ? "Pass:" : "Fail:", RESCTRL_PATH);
-- 
2.31.0

[PATCH v6 08/21] selftests/resctrl: Call kselftest APIs to log test results

2021-03-16 Thread Fenghua Yu

Call kselftest APIs instead of using printf() to log test results
for cleaner code and better future extension.

Suggested-by: Shuah Khan 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v6:
- Capitalize the first letter in printed msg (Babu).

v5:
- Add this patch (Shuah)

 tools/testing/selftests/resctrl/cat_test.c| 37 +++
 tools/testing/selftests/resctrl/cmt_test.c| 42 -
 tools/testing/selftests/resctrl/mba_test.c| 24 +-
 tools/testing/selftests/resctrl/mbm_test.c| 28 ++--
 tools/testing/selftests/resctrl/resctrl.h |  2 +-
 .../testing/selftests/resctrl/resctrl_tests.c | 40 +
 tools/testing/selftests/resctrl/resctrl_val.c |  4 +-
 tools/testing/selftests/resctrl/resctrlfs.c   | 45 +++
 8 files changed, 105 insertions(+), 117 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c 
b/tools/testing/selftests/resctrl/cat_test.c
index 20823725daca..1daf911076c7 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -52,25 +52,28 @@ static int cat_setup(int num, ...)
return ret;
 }
 
-static void show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
-   unsigned long span)
+static int show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
+  unsigned long span)
 {
unsigned long allocated_cache_lines = span / 64;
unsigned long avg_llc_perf_miss = 0;
float diff_percent;
+   int ret;
 
avg_llc_perf_miss = sum_llc_perf_miss / (NUM_OF_RUNS - 1);
diff_percent = ((float)allocated_cache_lines - avg_llc_perf_miss) /
allocated_cache_lines * 100;
 
-   printf("%sok CAT: cache miss rate within %d%%\n",
-  !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT ?
-  "not " : "", MAX_DIFF_PERCENT);
-   tests_run++;
-   printf("# Percent diff=%d\n", abs((int)diff_percent));
-   printf("# Number of bits: %d\n", no_of_bits);
-   printf("# Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);
-   printf("# Allocated cache lines: %lu\n", allocated_cache_lines);
+   ret = !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT;
+   ksft_print_msg("Cache miss rate %swithin %d%%\n",
+  ret ? "not " : "", MAX_DIFF_PERCENT);
+
+   ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
+   ksft_print_msg("Number of bits: %d\n", no_of_bits);
+   ksft_print_msg("Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);
+   ksft_print_msg("Allocated cache lines: %lu\n", allocated_cache_lines);
+
+   return ret;
 }
 
 static int check_results(struct resctrl_val_param *param)
@@ -80,7 +83,7 @@ static int check_results(struct resctrl_val_param *param)
int runs = 0, no_of_bits = 0;
FILE *fp;
 
-   printf("# Checking for pass/fail\n");
+   ksft_print_msg("Checking for pass/fail\n");
fp = fopen(param->filename, "r");
if (!fp) {
perror("# Cannot open file");
@@ -108,9 +111,7 @@ static int check_results(struct resctrl_val_param *param)
fclose(fp);
no_of_bits = count_bits(param->mask);
 
-   show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);
-
-   return 0;
+   return show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);
 }
 
 void cat_test_cleanup(void)
@@ -146,15 +147,15 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
ret = get_cache_size(cpu_no, cache_type, _size);
if (ret)
return ret;
-   printf("cache size :%lu\n", cache_size);
+   ksft_print_msg("Cache size :%lu\n", cache_size);
 
/* Get max number of bits from default-cabm mask */
count_of_bits = count_bits(long_mask);
 
if (n < 1 || n > count_of_bits - 1) {
-   printf("Invalid input value for no_of_bits n!\n");
-   printf("Please Enter value in range 1 to %d\n",
-  count_of_bits - 1);
+   ksft_print_msg("Invalid input value for no_of_bits n!\n");
+   ksft_print_msg("Please enter value in range 1 to %d\n",
+  count_of_bits - 1);
return -1;
}
 
diff --git a/tools/testing/selftests/resctrl/cmt_test.c 
b/tools/testing/selftests/resctrl/cmt_test.c
index 4b63838dda32..b1ab1bd1f74d 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -39,36 +39,33 @@ static int cmt_setup(int num, ...)
return 0;
 }
 
-static void show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits,
-   unsigned long span)
+static int show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits,
+  unsigned long span)
 {
unsigned long avg_llc_occu_resc = 0;
float diff_percent;
long

[PATCH v6 01/21] selftests/resctrl: Enable gcc checks to detect buffer overflows

2021-03-16 Thread Fenghua Yu

David reported a buffer overflow error in the check_results() function of
the cmt unit test and he suggested enabling _FORTIFY_SOURCE gcc compiler
option to automatically detect any such errors.

Feature Test Macros man page describes_FORTIFY_SOURCE as below

"Defining this macro causes some lightweight checks to be performed to
detect some buffer overflow errors when employing various string and memory
manipulation functions (for example, memcpy, memset, stpcpy, strcpy,
strncpy, strcat, strncat, sprintf, snprintf, vsprintf, vsnprintf, gets, and
wide character variants thereof). For some functions, argument consistency
is checked; for example, a check is made that open has been supplied with a
mode argument when the specified flags include O_CREAT. Not all problems
are detected, just some common cases.

If _FORTIFY_SOURCE is set to 1, with compiler optimization level 1 (gcc
-O1) and above, checks that shouldn't change the behavior of conforming
programs are performed.

With _FORTIFY_SOURCE set to 2, some more checking is added, but some
conforming programs might fail.

Some of the checks can be performed at compile time (via macros logic
implemented in header files), and result in compiler warnings; other checks
take place at run time, and result in a run-time error if the check fails.

Use of this macro requires compiler support, available with gcc since
version 4.0."

Fix the buffer overflow error in the check_results() function of the cmt
unit test and enable _FORTIFY_SOURCE gcc check to catch any future buffer
overflow errors.

Reported-by: David Binderman 
Suggested-by: David Binderman 
Tested-by: Babu Moger 
Signed-off-by: Fenghua Yu 
---
Change Log:
v5:
- Move from v4's patch 11 to patch 1 so the fix patch should be first
  (Shuah).

 tools/testing/selftests/resctrl/Makefile   | 2 +-
 tools/testing/selftests/resctrl/cqm_test.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/Makefile 
b/tools/testing/selftests/resctrl/Makefile
index d585cc1948cc..6bcee2ec91a9 100644
--- a/tools/testing/selftests/resctrl/Makefile
+++ b/tools/testing/selftests/resctrl/Makefile
@@ -1,5 +1,5 @@
 CC = $(CROSS_COMPILE)gcc
-CFLAGS = -g -Wall
+CFLAGS = -g -Wall -O2 -D_FORTIFY_SOURCE=2
 SRCS=$(wildcard *.c)
 OBJS=$(SRCS:.c=.o)
 
diff --git a/tools/testing/selftests/resctrl/cqm_test.c 
b/tools/testing/selftests/resctrl/cqm_test.c
index c8756152bd61..5e7308ac63be 100644
--- a/tools/testing/selftests/resctrl/cqm_test.c
+++ b/tools/testing/selftests/resctrl/cqm_test.c
@@ -86,7 +86,7 @@ static int check_results(struct resctrl_val_param *param, int 
no_of_bits)
return errno;
}
 
-   while (fgets(temp, 1024, fp)) {
+   while (fgets(temp, sizeof(temp), fp)) {
char *token = strtok(temp, ":\t");
int fields = 0;
 
-- 
2.31.0

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1754 matches

Mail list logo