Re: [PATCH v4 net] net: hv_netvsc: fix netvsc_send_completion to avoid multiple message length checks

2023-09-10 Thread Simon Horman
On Thu, Sep 07, 2023 at 12:56:47PM -0700, Sonia Sharma wrote:
> From: Sonia Sharma 
> 
> The switch statement in netvsc_send_completion() is incorrectly validating
> the length of incoming network packets by falling through to the next case.
> Avoid the fallthrough. Instead break after a case match and then process
> the complete() call.
> The current code has not caused any known failures. But nonetheless, the
> code should be corrected as a different ordering of the switch cases might
> cause a length check to fail when it should not.
> 
> Fixes: 44144185951a0f ("hv_netvsc: Add validation for untrusted Hyper-V 
> values")

As the current code is correct - it works - I feel that this is more of a
clean-up than a fix. As such I suggest dropping the fixes tag and
retargeting at net-next (which is due to re-open in the coming days).

> Signed-off-by: Sonia Sharma 
> 
> ---
> Changes in v3:
> * added return statement in default case as pointed by Michael Kelley.
> Changes in v4:
> * added fixes tag
> * modified commit message to explain the issue fixed by patch.
> ---
>  drivers/net/hyperv/netvsc.c | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 82e9796c8f5e..0f7e4d36 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -851,7 +851,7 @@ static void netvsc_send_completion(struct net_device 
> *ndev,
>  msglen);
>   return;
>   }
> - fallthrough;
> + break;
>  
>   case NVSP_MSG1_TYPE_SEND_RECV_BUF_COMPLETE:
>   if (msglen < sizeof(struct nvsp_message_header) +
> @@ -860,7 +860,7 @@ static void netvsc_send_completion(struct net_device 
> *ndev,
>  msglen);
>   return;
>   }
> - fallthrough;
> + break;
>  
>   case NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE:
>   if (msglen < sizeof(struct nvsp_message_header) +
> @@ -869,7 +869,7 @@ static void netvsc_send_completion(struct net_device 
> *ndev,
>  msglen);
>   return;
>   }
> - fallthrough;
> + break;
>  
>   case NVSP_MSG5_TYPE_SUBCHANNEL:
>   if (msglen < sizeof(struct nvsp_message_header) +
> @@ -878,10 +878,6 @@ static void netvsc_send_completion(struct net_device 
> *ndev,
>  msglen);
>   return;
>   }
> - /* Copy the response back */
> - memcpy(&net_device->channel_init_pkt, nvsp_packet,
> -sizeof(struct nvsp_message));
> - complete(&net_device->channel_init_wait);
>   break;
>  
>   case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE:
> @@ -904,13 +900,19 @@ static void netvsc_send_completion(struct net_device 
> *ndev,
>  
>   netvsc_send_tx_complete(ndev, net_device, incoming_channel,
>   desc, budget);
> - break;
> + return;
>  
>   default:
>   netdev_err(ndev,
>  "Unknown send completion type %d received!!\n",
>  nvsp_packet->hdr.msg_type);
> + return;
>   }
> +
> + /* Copy the response back */
> + memcpy(&net_device->channel_init_pkt, nvsp_packet,
> + sizeof(struct nvsp_message));

nit: the indentation of the line above is not correct.

memcpy(&net_device->channel_init_pkt, nvsp_packet,
   sizeof(struct nvsp_message));

> + complete(&net_device->channel_init_wait);
>  }
>  
>  static u32 netvsc_get_next_send_section(struct netvsc_device *net_device)
> -- 
> 2.25.1
> 
> 



Re: [PATCH v1 1/1] bitops: Share BYTES_TO_BITS() for everyone

2023-09-10 Thread Yury Norov
On Wed, Sep 06, 2023 at 05:54:26PM +0300, Andy Shevchenko wrote:
> On Wed, Sep 06, 2023 at 04:40:39PM +0200, Alexander Lobakin wrote:
> > From: Andy Shevchenko 
> > Date: Thu, 31 Aug 2023 16:21:30 +0300
> > > On Fri, Aug 25, 2023 at 04:49:07PM +0200, Alexander Lobakin wrote:
> > >> From: Andy Shevchenko 
> > >> Date: Thu, 24 Aug 2023 15:37:28 +0300
> > >>
> > >>> It may be new callers for the same macro, share it.
> > >>>
> > >>> Note, it's unknown why it's represented in the current form instead of
> > >>> simple multiplication and commit 1ff511e35ed8 ("tracing/kprobes: Add
> > >>> bitfield type") doesn't explain that neither. Let leave it as is and
> > >>> we may improve it in the future.
> > >>
> > >> Maybe symmetrical change in tools/ like I did[0] an aeon ago?
> > > 
> > > Hmm... Why can't you simply upstream your version? It seems better than 
> > > mine.
> > 
> > It was a part of the Netlink bigint API which is a bit on hold for now
> > (I needed this macro available treewide).
> > But I can send it as standalone if you're fine with that.
> 
> I'm fine. Yury?

Do we have opencoded BYTES_TO_BITS() somewhere else? If so, it should be
fixed in the same series.

Regarding implementation, the current:

#define BYTES_TO_BITS(nb)  ((BITS_PER_LONG * (nb)) / sizeof(long))

looks weird. Maybe there are some special considerations in a tracing
subsystem to make it like this, but as per Masami's email - there's
not. 

For a general purpose I'd suggest a simpler:
#define BYTES_TO_BITS(nb)  ((nb) * BITS_PER_BYTE)

Thanks,
Yury


Re: [PATCH] selftests/net: Improve bind_bhash.sh to accommodate predictable network interface names

2023-09-10 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller :

On Thu,  7 Sep 2023 00:26:03 +0800 you wrote:
> Starting with v197, systemd uses predictable interface network names,
> the traditional interface naming scheme (eth0) is deprecated, therefore
> it cannot be assumed that the eth0 interface exists on the host.
> 
> This modification makes the bind_bhash test program run in a separate
> network namespace and no longer needs to consider the name of the
> network interface on the host.
> 
> [...]

Here is the summary with links:
  - selftests/net: Improve bind_bhash.sh to accommodate predictable network 
interface names
https://git.kernel.org/netdev/net/c/ced33ca07d8d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




[PATCH] selftests/nolibc: libc-test: avoid -Wstringop-overflow warnings

2023-09-10 Thread Thomas Weißschuh
Newer versions of glibc annotate the poll() function with
__attribute__(access) which triggers a compiler warning inside the
testcase poll_fault.
Avoid this by using a plain NULL which is enough for the testcase.
To avoid potential future warnings also adapt the other EFAULT
testcases, except select_fault as NULL is a valid value for its
argument.

nolibc-test.c: In function ‘run_syscall’:
nolibc-test.c:338:62: warning: ‘poll’ writing 8 bytes into a region of size 0 
overflows the destination [-Wstringop-overflow=]
  338 | do { if (!(cond)) result(llen, SKIPPED); else ret += 
expect_syserr2(expr, expret, experr1, experr2, llen); } while (0)
  |  
^~~~
nolibc-test.c:341:9: note: in expansion of macro ‘EXPECT_SYSER2’
  341 | EXPECT_SYSER2(cond, expr, expret, experr, 0)
  | ^
nolibc-test.c:905:47: note: in expansion of macro ‘EXPECT_SYSER’
  905 | CASE_TEST(poll_fault);EXPECT_SYSER(1, 
poll((void *)1, 1, 0), -1, EFAULT); break;
  |   ^~~~
cc1: note: destination object is likely at address zero
In file included from /usr/include/poll.h:1,
 from nolibc-test.c:33:
/usr/include/sys/poll.h:54:12: note: in a call to function ‘poll’ declared with 
attribute ‘access (write_only, 1, 2)’
   54 | extern int poll (struct pollfd *__fds, nfds_t __nfds, int __timeout)
  |^~~~

Signed-off-by: Thomas Weißschuh 
---
 tools/testing/selftests/nolibc/nolibc-test.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/nolibc/nolibc-test.c 
b/tools/testing/selftests/nolibc/nolibc-test.c
index e2b70641a1e7..a0478f8eaee8 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -895,14 +895,14 @@ int run_syscall(int min, int max)
CASE_TEST(lseek_0);   EXPECT_SYSER(1, lseek(0, 0, 
SEEK_SET), -1, ESPIPE); break;
CASE_TEST(mkdir_root);EXPECT_SYSER(1, mkdir("/", 0755), 
-1, EEXIST); break;
CASE_TEST(mmap_bad);  EXPECT_PTRER(1, mmap(NULL, 0, 
PROT_READ, MAP_PRIVATE, 0, 0), MAP_FAILED, EINVAL); break;
-   CASE_TEST(munmap_bad);EXPECT_SYSER(1, munmap((void *)1, 
0), -1, EINVAL); break;
+   CASE_TEST(munmap_bad);EXPECT_SYSER(1, munmap(NULL, 0), 
-1, EINVAL); break;
CASE_TEST(mmap_munmap_good);  EXPECT_SYSZR(1, 
test_mmap_munmap()); break;
CASE_TEST(open_tty);  EXPECT_SYSNE(1, tmp = 
open("/dev/null", 0), -1); if (tmp != -1) close(tmp); break;
CASE_TEST(open_blah); EXPECT_SYSER(1, tmp = 
open("/proc/self/blah", 0), -1, ENOENT); if (tmp != -1) close(tmp); break;
CASE_TEST(pipe);  EXPECT_SYSZR(1, test_pipe()); 
break;
CASE_TEST(poll_null); EXPECT_SYSZR(1, poll(NULL, 0, 
0)); break;
CASE_TEST(poll_stdout);   EXPECT_SYSNE(1, ({ struct pollfd 
fds = { 1, POLLOUT, 0}; poll(&fds, 1, 0); }), -1); break;
-   CASE_TEST(poll_fault);EXPECT_SYSER(1, poll((void *)1, 
1, 0), -1, EFAULT); break;
+   CASE_TEST(poll_fault);EXPECT_SYSER(1, poll(NULL, 1, 0), 
-1, EFAULT); break;
CASE_TEST(prctl); EXPECT_SYSER(1, 
prctl(PR_SET_NAME, (unsigned long)NULL, 0, 0, 0), -1, EFAULT); break;
CASE_TEST(read_badf); EXPECT_SYSER(1, read(-1, &tmp, 
1), -1, EBADF); break;
CASE_TEST(rmdir_blah);EXPECT_SYSER(1, rmdir("/blah"), 
-1, ENOENT); break;
@@ -911,7 +911,7 @@ int run_syscall(int min, int max)
CASE_TEST(select_stdout); EXPECT_SYSNE(1, ({ fd_set fds; 
FD_ZERO(&fds); FD_SET(1, &fds); select(2, NULL, &fds, NULL, NULL); }), -1); 
break;
CASE_TEST(select_fault);  EXPECT_SYSER(1, select(1, (void 
*)1, NULL, NULL, 0), -1, EFAULT); break;
CASE_TEST(stat_blah); EXPECT_SYSER(1, 
stat("/proc/self/blah", &stat_buf), -1, ENOENT); break;
-   CASE_TEST(stat_fault);EXPECT_SYSER(1, stat((void *)1, 
&stat_buf), -1, EFAULT); break;
+   CASE_TEST(stat_fault);EXPECT_SYSER(1, stat(NULL, 
&stat_buf), -1, EFAULT); break;
CASE_TEST(stat_timestamps);   EXPECT_SYSZR(1, 
test_stat_timestamps()); break;
CASE_TEST(symlink_root);  EXPECT_SYSER(1, symlink("/", 
"/"), -1, EEXIST); break;
CASE_TEST(unlink_root);   EXPECT_SYSER(1, unlink("/"), -1, 
EISDIR); break;

---
base-commit: f7a6e4791e3d685eddca29b5d16d183ee0407caa
change-id: 20230910-nolibc-poll-fault-4152a6836ef8

Best regards,
-- 
Thomas Weißschuh 



Re: [PATCH] ftrace/selftests: Add softlink to latest log directory

2023-09-10 Thread Google
On Fri, 8 Sep 2023 18:17:21 -0400
Steven Rostedt  wrote:

> From: Steven Rostedt (Google) 
> 
> When I'm debugging something with the ftrace selftests and need to look at
> the logs, it becomes tedious that I need to do the following:
> 
>  ls -ltr logs
>  [ copy the last directory ]
>  ls logs/
> 
> to see where the logs are.
> 
> Instead, do the common practice of having a "latest" softlink to the last
> run selftest. This way after running the selftest I only need to do:
> 
>  ls logs/latest/
> 
> and it will always give me the directory of the last run selftest logs!
> 

Nice! I like this and this looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thank you!

> Signed-off-by: Steven Rostedt (Google) 
> ---
> 
> diff --git a/tools/testing/selftests/ftrace/ftracetest 
> b/tools/testing/selftests/ftrace/ftracetest
> index cb5f18c06593..7df8baa0f98f 100755
> --- a/tools/testing/selftests/ftrace/ftracetest
> +++ b/tools/testing/selftests/ftrace/ftracetest
> @@ -124,6 +124,7 @@ parse_opts() { # opts
>  ;;
>  --logdir|-l)
>LOG_DIR=$2
> +  LINK_PTR=
>shift 2
>  ;;
>  *.tc)
> @@ -181,7 +182,10 @@ fi
>  TOP_DIR=`absdir $0`
>  TEST_DIR=$TOP_DIR/test.d
>  TEST_CASES=`find_testcases $TEST_DIR`
> -LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
> +LOG_TOP_DIR=$TOP_DIR/logs
> +LOG_DATE=`date +%Y%m%d-%H%M%S`
> +LOG_DIR=$LOG_TOP_DIR/$LOG_DATE/
> +LINK_PTR=$LOG_TOP_DIR/latest
>  KEEP_LOG=0
>  KTAP=0
>  DEBUG=0
> @@ -207,6 +211,10 @@ else
>LOG_FILE=$LOG_DIR/ftracetest.log
>mkdir -p $LOG_DIR || errexit "Failed to make a log directory: $LOG_DIR"
>date > $LOG_FILE
> +  if [ "x-$LINK_PTR" != "x-" ]; then
> +unlink $LINK_PTR
> +ln -fs $LOG_DATE $LINK_PTR
> +  fi
>  fi
>  
>  # Define text colors


-- 
Masami Hiramatsu (Google) 


Re: [RESEND][PATCH] tracing/synthetic: Fix order of struct trace_dynamic_info

2023-09-10 Thread Google
On Fri, 8 Sep 2023 16:39:29 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> To make handling BIG and LITTLE endian better the offset/len of dynamic
> fields of the synthetic events was changed into a structure of:
> 
>  struct trace_dynamic_info {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   u16 offset;
>   u16 len;
>  #else
>   u16 len;
>   u16 offset;
>  #endif
>  };
> 
> to replace the manual changes of:
> 
>  data_offset = offset & 0x;
>  data_offest = len << 16;
> 
> But if you look closely, the above is:
> 
><< 16 | offset
> 
> Which in little endian would be in memory:
> 
>  offset_lo offset_hi len_lo len_hi
> 
> and in big endian:
> 
>  len_hi len_lo offset_hi offset_lo
> 
> Which if broken into a structure would be:
> 
>  struct trace_dynamic_info {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   u16 len;
>   u16 offset;
>  #else
>   u16 offset;
>   u16 len;
>  #endif
>  };
> 
> Which is the opposite of what was defined.
> 
> Fix this and just to be safe also add "__packed".
> 
> Link: https://lore.kernel.org/all/20230908154417.5172e...@gandalf.local.home/

Good catch! I'm not sure why this worked. Maybe we don't have any testcase
for this feature?

Acked-by: Masami Hiramatsu (Google) 

Thank you,

> 
> Cc: sta...@vger.kernel.org
> Fixes: ddeea494a16f3 ("tracing/synthetic: Use union instead of casts")
> Signed-off-by: Steven Rostedt (Google) 
> ---
> 
>  [ Resending to the correct mailing list this time :-p ]
> 
>  include/linux/trace_events.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 12f875e9e69a..21ae37e49319 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -62,13 +62,13 @@ void trace_event_printf(struct trace_iterator *iter, 
> const char *fmt, ...);
>  /* Used to find the offset and length of dynamic fields in trace events */
>  struct trace_dynamic_info {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> - u16 offset;
>   u16 len;
> + u16 offset;
>  #else
> - u16 len;
>   u16 offset;
> + u16 len;
>  #endif
> -};
> +} __packed;
>  
>  /*
>   * The trace entry - the most basic unit of tracing. This is what
> -- 
> 2.40.1
> 


-- 
Masami Hiramatsu (Google) 


suspicious RCU usage warning on tracing/urgent

2023-09-10 Thread Google
Hi Steve,

I got this suspicious RCU usage warning when I ran ftracetest on
tracing/urgent branch.

[1] Basic trace file check[   17.172817] 
[   17.174621] =
[   17.177730] WARNING: suspicious RCU usage
[   17.180962] 6.5.0-10750-g595efe1079cd #47 Tainted: G N
[   17.185528] -
[   17.188685] fs/tracefs/event_inode.c:455 RCU-list traversed in non-reader 
section!!
[   17.194633] 
[   17.194633] other info that might help us debug this:
[   17.194633] 
[   17.200969] 
[   17.200969] rcu_scheduler_active = 2, debug_locks = 1
[   17.206086] 1 lock held by ftracetest/171:
[   17.209265]  #0: 829c2d30 (eventfs_srcu){.+.+}-{0:0}, at: 
dcache_dir_open_wrapper+0x3f/0x190
[   17.215551] 
[   17.215551] stack backtrace:
[   17.218498] CPU: 5 PID: 171 Comm: ftracetest Tainted: G N 
6.5.0-10750-g595efe1079cd #47
[   17.223364] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.15.0-1 04/01/2014
[   17.228206] Call Trace:
[   17.230076]  
[   17.231812]  dump_stack_lvl+0x66/0x80
[   17.234260]  lockdep_rcu_suspicious+0x158/0x1c0
[   17.237113]  ? __pfx_dcache_dir_open_wrapper+0x10/0x10
[   17.240026]  dcache_dir_open_wrapper+0x14c/0x190
[   17.242663]  ? __pfx_dcache_dir_open_wrapper+0x10/0x10
[   17.245592]  do_dentry_open+0x16a/0x550
[   17.248203]  do_open+0x272/0x3d0
[   17.250584]  path_openat+0x119/0x290
[   17.253046]  ? __lock_acquire+0x504/0xb70
[   17.255658]  do_filp_open+0xb6/0x160
[   17.258015]  ? find_held_lock+0x2b/0x80
[   17.260421]  ? alloc_fd+0x12b/0x220
[   17.262839]  ? trace_preempt_on+0x7a/0x90
[   17.265763]  ? preempt_count_sub+0x4b/0x60
[   17.268631]  ? _raw_spin_unlock+0x2d/0x50
[   17.271249]  do_sys_openat2+0x96/0xd0
[   17.273499]  __x64_sys_openat+0x57/0xa0
[   17.275808]  do_syscall_64+0x3f/0x90
[   17.277995]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   17.281250] RIP: 0033:0x4bce8c
[   17.283498] Code: 24 18 31 c0 41 83 e2 40 75 44 89 f0 25 00 00 41 00 3d 00 
00 41 00 74 36 44 89 c2 4c 89 ce bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 44 48 8b 4c 24 18 64 48 33 0c 25 28 00 00 00
[   17.294543] RSP: 002b:7fffa59a3e20 EFLAGS: 0287 ORIG_RAX: 
0101
[   17.299479] RAX: ffda RBX:  RCX: 004bce8c
[   17.303935] RDX: 00090800 RSI: 0130b0c0 RDI: ff9c
[   17.308187] RBP: 7fffa59a4098 R08: 00090800 R09: 0130b0c0
[   17.312483] R10:  R11: 0287 R12: 013099ff
[   17.316401] R13: 0012 R14: 01309a00 R15: 0130b0c7
[   17.320543]  

But it seems correctly taking srcu_read_lock().

452 
453 ei = ti->private;
454 idx = srcu_read_lock(&eventfs_srcu);
455 list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
456 create_dentry(ef, dentry, false);
457 }
458 srcu_read_unlock(&eventfs_srcu, idx);
459 return dcache_dir_open(inode, file);
460 }
461 

This may false-positive warning, or srcu_read_lock() is not enough for
list_for_each_entry_rcu(). In latter case, maybe we need to use a
mutex instead of srcu for update the ef.

BTW, the ftracetest itself passed without any problem.

Thank you,

-- 
Masami Hiramatsu (Google) 


[PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions

2023-09-10 Thread SeongJae Park
DAMON provides damon_aggregated tracepoint, which exposes details of
each region and its access monitoring results.  It is useful for
getting whole monitoring results, e.g., for recording purposes.

For investigations of DAMOS, DAMON Sysfs interface provides DAMOS
statistics and tried_regions directory.  But, those provides only
statistics and snapshots.  If the scheme is frequently applied and if
the user needs to know every detail of DAMOS behavior, the
snapshot-based interface could be insufficient and expensive.

As a last resort, userspace users need to record the all monitoring
results via damon_aggregated tracepoint and simulate how DAMOS would
worked.  It is unnecessarily complicated.  DAMON kernel API users,
meanwhile, can do that easily via before_damos_apply() callback field of
'struct damon_callback', though.

Add a tracepoint that will be called just after before_damos_apply()
callback for more convenient investigations of DAMOS.  The tracepoint
exposes all details about each regions, similar to damon_aggregated
tracepoint.

Please note that DAMOS is currently not only for memory management but
also for query-like efficient monitoring results retrievals (when 'stat'
action is used).  Until now, only statistics or snapshots were
supported.  Addition of this tracepoint allows efficient full recording
of DAMOS-based filtered monitoring results.

Signed-off-by: SeongJae Park 
---
 include/trace/events/damon.h | 37 
 mm/damon/core.c  | 27 +-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 0b8d13bde17a..9e7b39495b05 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -9,6 +9,43 @@
 #include 
 #include 
 
+TRACE_EVENT(damos_before_apply,
+
+   TP_PROTO(unsigned int context_idx, unsigned int scheme_idx,
+   unsigned int target_idx, struct damon_region *r,
+   unsigned int nr_regions),
+
+   TP_ARGS(context_idx, target_idx, scheme_idx, r, nr_regions),
+
+   TP_STRUCT__entry(
+   __field(unsigned int, context_idx)
+   __field(unsigned int, scheme_idx)
+   __field(unsigned long, target_idx)
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned int, nr_accesses)
+   __field(unsigned int, age)
+   __field(unsigned int, nr_regions)
+   ),
+
+   TP_fast_assign(
+   __entry->context_idx = context_idx;
+   __entry->scheme_idx = scheme_idx;
+   __entry->target_idx = target_idx;
+   __entry->start = r->ar.start;
+   __entry->end = r->ar.end;
+   __entry->nr_accesses = r->nr_accesses;
+   __entry->age = r->age;
+   __entry->nr_regions = nr_regions;
+   ),
+
+   TP_printk("ctx_idx=%u scheme_idx=%u target_idx=%lu nr_regions=%u 
%lu-%lu: %u %u",
+   __entry->context_idx, __entry->scheme_idx,
+   __entry->target_idx, __entry->nr_regions,
+   __entry->start, __entry->end,
+   __entry->nr_accesses, __entry->age)
+);
+
 TRACE_EVENT(damon_aggregated,
 
TP_PROTO(unsigned int target_id, struct damon_region *r,
diff --git a/mm/damon/core.c b/mm/damon/core.c
index ca631dd88b33..aa7fbcdf7310 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -950,6 +950,28 @@ static void damos_apply_scheme(struct damon_ctx *c, struct 
damon_target *t,
struct timespec64 begin, end;
unsigned long sz_applied = 0;
int err = 0;
+   /*
+* We plan to support multiple context per kdamond, as DAMON sysfs
+* implies with 'nr_contexts' file.  Nevertheless, only single context
+* per kdamond is supported for now.  So, we can simply use '0' context
+* index here.
+*/
+   unsigned int cidx = 0;
+   struct damos *siter;/* schemes iterator */
+   unsigned int sidx = 0;
+   struct damon_target *titer; /* targets iterator */
+   unsigned int tidx = 0;
+
+   damon_for_each_scheme(siter, c) {
+   if (siter == s)
+   break;
+   sidx++;
+   }
+   damon_for_each_target(titer, c) {
+   if (titer == t)
+   break;
+   tidx++;
+   }
 
if (c->ops.apply_scheme) {
if (quota->esz && quota->charged_sz + sz > quota->esz) {
@@ -964,8 +986,11 @@ static void damos_apply_scheme(struct damon_ctx *c, struct 
damon_target *t,
ktime_get_coarse_ts64(&begin);
if (c->callback.before_damos_apply)
err = c->callback.before_damos_apply(c, t, r, s);
-   if (!err)
+   if (!err) {
+   trace_damos_before_apply(cidx, sidx, tidx, 

[RFC v3 0/2] CPU-Idle latency selftest framework

2023-09-10 Thread Aboorva Devarajan
Changelog: v2 -> v3

* Minimal code refactoring
* Rebased on v6.6-rc1

RFC v1: 
https://lore.kernel.org/all/20210611124154.56427-1-psam...@linux.ibm.com/

RFC v2:
https://lore.kernel.org/all/20230828061530.126588-2-aboor...@linux.vnet.ibm.com/

Other related RFC:
https://lore.kernel.org/all/20210430082804.38018-1-psam...@linux.ibm.com/

Userspace selftest:
https://lkml.org/lkml/2020/9/2/356



A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.

The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will be additional latencies
of the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.

To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.

The kernel module provides the following interfaces within
/sys/kernel/debug/powerpc/latency_test/ for,

IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer

Sample output is as follows:

# --IPI Latency Test---
# Baseline Avg IPI latency(ns): 2720
# Observed Avg IPI latency(ns) - State snooze: 2565
# Observed Avg IPI latency(ns) - State stop0_lite: 3856
# Observed Avg IPI latency(ns) - State stop0: 3670
# Observed Avg IPI latency(ns) - State stop1: 3872
# Observed Avg IPI latency(ns) - State stop2: 17421
# Observed Avg IPI latency(ns) - State stop4: 1003922
# Observed Avg IPI latency(ns) - State stop5: 1058870
#
# --Timeout Latency Test--
# Baseline Avg timeout diff(ns): 1435
# Observed Avg timeout diff(ns) - State snooze: 1709
# Observed Avg timeout diff(ns) - State stop0_lite: 2028
# Observed Avg timeout diff(ns) - State stop0: 1954
# Observed Avg timeout diff(ns) - State stop1: 1895
# Observed Avg timeout diff(ns) - State stop2: 14556
# Observed Avg timeout diff(ns) - State stop4: 873988
# Observed Avg timeout diff(ns) - State stop5: 959137

Aboorva Devarajan (2):
  powerpc/cpuidle: cpuidle wakeup latency based on IPI and timer events
  powerpc/selftest: Add support for cpuidle latency measurement

 arch/powerpc/Kconfig.debug|  10 +
 arch/powerpc/kernel/Makefile  |   1 +
 arch/powerpc/kernel/test_cpuidle_latency.c| 154 ++
 tools/testing/selftests/powerpc/Makefile  |   1 +
 .../powerpc/cpuidle_latency/.gitignore|   2 +
 .../powerpc/cpuidle_latency/Makefile  |   6 +
 .../cpuidle_latency/cpuidle_latency.sh| 443 ++
 .../powerpc/cpuidle_latency/settings  |   1 +
 8 files changed, 618 insertions(+)
 create mode 100644 arch/powerpc/kernel/test_cpuidle_latency.c
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile
 create mode 100755 
tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings

-- 
2.25.1



[RFC v3 1/2] powerpc/cpuidle: cpuidle wakeup latency based on IPI and timer events

2023-09-10 Thread Aboorva Devarajan
From: Pratik R. Sampat 

Introduce a mechanism to fire directed IPIs from a source CPU to a
specified target CPU and measure the time incurred on waking up the
target CPU in response.

Also, introduce a mechanism to queue a hrtimer on a specified CPU and
subsequently measure the time taken to wakeup the CPU.

Define a simple debugfs interface that allows for adjusting the
settings to trigger IPI and timer events on a designated CPU, and to
observe the resulting cpuidle wakeup latencies.

Reviewed-by: Srikar Dronamraju 
Signed-off-by: Pratik R. Sampat 
Signed-off-by: Aboorva Devarajan 
---
 arch/powerpc/Kconfig.debug |  10 ++
 arch/powerpc/kernel/Makefile   |   1 +
 arch/powerpc/kernel/test_cpuidle_latency.c | 154 +
 3 files changed, 165 insertions(+)
 create mode 100644 arch/powerpc/kernel/test_cpuidle_latency.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 2a54fadbeaf5..e175fc3028ac 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -391,3 +391,13 @@ config KASAN_SHADOW_OFFSET
default 0xe000 if PPC32
default 0xa80e if PPC_BOOK3S_64
default 0xa8001c00 if PPC_BOOK3E_64
+
+config CPUIDLE_LATENCY_SELFTEST
+   tristate "Cpuidle latency selftests"
+   depends on CPU_IDLE
+   help
+ Provides a kernel module that run tests using the IPI and
+ timers to measure cpuidle latency.
+
+ Say M if you want these self tests to build as a module.
+ Say N if you are unsure.
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2919433be355..3c5a576bbcf2 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -94,6 +94,7 @@ obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o
 obj-$(CONFIG_PPC64)+= vdso64_wrapper.o
 obj-$(CONFIG_ALTIVEC)  += vecemu.o
 obj-$(CONFIG_PPC_BOOK3S_IDLE)  += idle_book3s.o
+obj-$(CONFIG_CPUIDLE_LATENCY_SELFTEST)  += test_cpuidle_latency.o
 procfs-y   := proc_powerpc.o
 obj-$(CONFIG_PROC_FS)  += $(procfs-y)
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)  := rtas_pci.o
diff --git a/arch/powerpc/kernel/test_cpuidle_latency.c 
b/arch/powerpc/kernel/test_cpuidle_latency.c
new file mode 100644
index ..c93a8f76
--- /dev/null
+++ b/arch/powerpc/kernel/test_cpuidle_latency.c
@@ -0,0 +1,154 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Module-based API test facility for cpuidle latency using IPIs and timers
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ * IPI based wakeup latencies
+ * Measure time taken for a CPU to wakeup on a IPI sent from another CPU
+ * The latency measured also includes the latency of sending the IPI
+ */
+struct latency {
+   unsigned int src_cpu;
+   unsigned int dest_cpu;
+   ktime_t time_start;
+   ktime_t time_end;
+   u64 latency_ns;
+} ipi_wakeup;
+
+static void measure_latency(void *info)
+{
+   struct latency *v = (struct latency *)info;
+   ktime_t time_diff;
+
+   v->time_end = ktime_get();
+   time_diff = ktime_sub(v->time_end, v->time_start);
+   v->latency_ns = ktime_to_ns(time_diff);
+}
+
+void run_smp_call_function_test(unsigned int cpu)
+{
+   ipi_wakeup.src_cpu = smp_processor_id();
+   ipi_wakeup.dest_cpu = cpu;
+   ipi_wakeup.time_start = ktime_get();
+   smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1);
+}
+
+/*
+ * Timer based wakeup latencies
+ * Measure time taken for a CPU to wakeup on a timer being armed and fired
+ */
+struct timer_data {
+   unsigned int src_cpu;
+   u64 timeout;
+   ktime_t time_start;
+   ktime_t time_end;
+   struct hrtimer timer;
+   u64 timeout_diff_ns;
+} timer_wakeup;
+
+static enum hrtimer_restart hrtimer_callback(struct hrtimer *hrtimer)
+{
+   struct timer_data *w;
+   ktime_t time_diff;
+
+   w = container_of(hrtimer, struct timer_data, timer);
+   w->time_end = ktime_get();
+
+   time_diff = ktime_sub(w->time_end, w->time_start);
+   time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
+   w->timeout_diff_ns = ktime_to_ns(time_diff);
+   return HRTIMER_NORESTART;
+}
+
+static void run_timer_test(unsigned int ns)
+{
+   hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   timer_wakeup.timer.function = hrtimer_callback;
+   timer_wakeup.src_cpu = smp_processor_id();
+   timer_wakeup.timeout = ns;
+   timer_wakeup.time_start = ktime_get();
+
+   hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns),
+ HRTIMER_MODE_REL_PINNED);
+}
+
+static struct dentry *dir;
+
+static int cpu_read_op(void *data, u64 *dest_cpu)
+{
+   *dest_cpu = ipi_wakeup.dest_cpu;
+   return 0;
+}
+
+/*
+ * Send a directed IPI from the current CPU (source) to the destination CPU and
+ * measure the latency on wakeup.
+ */
+static int cpu_write_op(void *data, u64 val

[RFC v3 2/2] powerpc/selftest: Add support for cpuidle latency measurement

2023-09-10 Thread Aboorva Devarajan
From: Pratik R. Sampat 

The cpuidle latency selftest provides support to systematically extract,
analyse and present IPI and timer based wakeup latencies for each CPU
and each idle state available on the system.

The selftest leverages test_cpuidle_latency module's debugfs interface
to interact and extract latency information from the kernel.

The selftest inserts the module if already not inserted, disables all
the idle states and enables them one by one testing the following:

1. Keeping source CPU constant, iterate through all the cores and pick
   a single CPU for each core measuring IPI latency for baseline
   (CPU is busy with cat /dev/random > /dev/null workload) and then
   when the CPU is idle.
2. Iterating through all the CPU cores and selecting one CPU for each
   core, then, the expected timer durations to be equivalent to the
   residency of the deepest idle state enabled is sent to the selected
   target CPU, then the difference between the expected timer duration
   and the time of wakeup is determined.

To run this test specifically:
$ sudo make -C tools/testing/selftests \
  TARGETS="powerpc/cpuidle_latency" run_tests

There are a few optional arguments too that the script can take
[-h ]
[-i ]
[-m ]
[-s ]
[-o ]
[-v  (run on all cpus)]

Default Output location in:
tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.log

To run the test without re-compiling:
$ cd tools/testing/selftest/powerpc/cpuidle_latency/
$ sudo ./cpuidle_latency.sh

Reviewed-by: Srikar Dronamraju 
Signed-off-by: Pratik R. Sampat 
Signed-off-by: Aboorva Devarajan 
---
 tools/testing/selftests/powerpc/Makefile  |   1 +
 .../powerpc/cpuidle_latency/.gitignore|   2 +
 .../powerpc/cpuidle_latency/Makefile  |   6 +
 .../cpuidle_latency/cpuidle_latency.sh| 443 ++
 .../powerpc/cpuidle_latency/settings  |   1 +
 5 files changed, 453 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile
 create mode 100755 
tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 49f2ad1793fd..efac7270ce1f 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -17,6 +17,7 @@ SUB_DIRS = alignment  \
   benchmarks   \
   cache_shape  \
   copyloops\
+  cpuidle_latency  \
   dexcr\
   dscr \
   mm   \
diff --git a/tools/testing/selftests/powerpc/cpuidle_latency/.gitignore 
b/tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
new file mode 100644
index ..987f8852dc59
--- /dev/null
+++ b/tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+cpuidle_latency.log
diff --git a/tools/testing/selftests/powerpc/cpuidle_latency/Makefile 
b/tools/testing/selftests/powerpc/cpuidle_latency/Makefile
new file mode 100644
index ..04492b6d2582
--- /dev/null
+++ b/tools/testing/selftests/powerpc/cpuidle_latency/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+TEST_PROGS := cpuidle_latency.sh
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh 
b/tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
new file mode 100755
index ..c6b1beffa85f
--- /dev/null
+++ b/tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
@@ -0,0 +1,443 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# CPU-Idle latency selftest enables systematic retrieval and presentation
+# of IPI and timer-triggered wake-up latencies for every CPU and available
+# system idle state by leveraging the test_cpuidle_latency module.
+#
+# Author: Pratik R. Sampat  
+# Author: Aboorva Devarajan 
+
+DISABLE=1
+ENABLE=0
+
+LOG=cpuidle_latency.log
+MODULE=/lib/modules/$(uname 
-r)/kernel/arch/powerpc/kernel/test_cpuidle_latency.ko
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+exit_status=0
+
+RUN_TIMER_TEST=1
+TIMEOUT=100
+VERBOSE=0
+
+IPI_SRC_CPU=0
+
+helpme() {
+printf "Usage: %s [-h] [-todg args]
+   [-h ]
+   [-s  (default: 0)]
+   [-m ]
+   [-o ]
+   [-v  (execute test across all CPU threads)]
+   [-i ]
+   \n" "$0"
+exit 2
+}
+
+cpu_is_online() {
+local cpu=$1
+if [ ! -f "/sys/devices/system/cpu/cpu$cpu/online" ]; then
+printf "CPU %s: file not found: /sys/devices/system/cpu/cpu%s/online" 
"$cpu" "$cpu"
+return 0
+fi
+status=$(cat /sys/devices/system/cpu/cpu"$cpu"/online)
+return "$status"
+}
+
+chec

RE: [PATCH V6 3/7] cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.

2023-09-10 Thread Meng, Li (Jassmine)
[AMD Official Use Only - General]

Hi Peter:

> -Original Message-
> From: Peter Zijlstra 
> Sent: Sunday, September 10, 2023 1:40 AM
> To: Meng, Li (Jassmine) 
> Cc: Rafael J . Wysocki ; Huang, Ray
> ; linux...@vger.kernel.org; linux-
> ker...@vger.kernel.org; x...@kernel.org; linux-a...@vger.kernel.org; Shuah
> Khan ; linux-kselft...@vger.kernel.org;
> Fontenot, Nathan ; Sharma, Deepak
> ; Deucher, Alexander
> ; Limonciello, Mario
> ; Huang, Shimmer
> ; Yuan, Perry ; Du,
> Xiaojian ; Viresh Kumar ;
> Borislav Petkov 
> Subject: Re: [PATCH V6 3/7] cpufreq: amd-pstate: Enable amd-pstate
> preferred core supporting.
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Fri, Sep 08, 2023 at 03:46:49PM +0800, Meng Li wrote:
> > +static void amd_pstate_init_prefcore(void) {
> > + int cpu, ret;
> > + u64 highest_perf;
> > +
> > + if (!prefcore)
> > + return;
> > +
> > + for_each_online_cpu(cpu) {
> > + ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
> > + if (ret)
> > + break;
> > +
> > + sched_set_itmt_core_prio(highest_perf, cpu);
> > +
> > + /* check if CPPC preferred core feature is enabled*/
> > + if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
> > + pr_debug("AMD CPPC preferred core is unsupported!\n");
> > + hw_prefcore = false;
> > + prefcore = false;
> > + return;
> > + }
> > + }
> > +
> > + /*
> > +  * This code can be run during CPU online under the
> > +  * CPU hotplug locks, so sched_set_amd_prefcore_support()
> > +  * cannot be called from here.  Queue up a work item
> > +  * to invoke it.
> > +  */
> > + schedule_work(&sched_prefcore_work);
> > +}
>
> Brilliant, repost without addressing prior feedback..  :-(
[Meng, Li (Jassmine)]
I am very sorry that I did not receive your V5 review comments before sending 
V6 patches.
However, thank you very much for your review.
I will solve them one by one in the future V7 patches.
Thank you very much!


Re: [RESEND][PATCH] tracing/synthetic: Fix order of struct trace_dynamic_info

2023-09-10 Thread Sven Schnelle
Hi Steven,

Steven Rostedt  writes:

> From: "Steven Rostedt (Google)" 
>
> To make handling BIG and LITTLE endian better the offset/len of dynamic
> fields of the synthetic events was changed into a structure of:
>
>  struct trace_dynamic_info {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   u16 offset;
>   u16 len;
>  #else
>   u16 len;
>   u16 offset;
>  #endif
>  };
>
> to replace the manual changes of:
>
>  data_offset = offset & 0x;
>  data_offest = len << 16;
>
> But if you look closely, the above is:
>
><< 16 | offset
>
> Which in little endian would be in memory:
>
>  offset_lo offset_hi len_lo len_hi
>
> and in big endian:
>
>  len_hi len_lo offset_hi offset_lo
>
> Which if broken into a structure would be:
>
>  struct trace_dynamic_info {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   u16 len;
>   u16 offset;
>  #else
>   u16 offset;
>   u16 len;
>  #endif
>  };
>
> Which is the opposite of what was defined.
>
> Fix this and just to be safe also add "__packed".
>
> Link: https://lore.kernel.org/all/20230908154417.5172e...@gandalf.local.home/
>
> Cc: sta...@vger.kernel.org
> Fixes: ddeea494a16f3 ("tracing/synthetic: Use union instead of casts")
> Signed-off-by: Steven Rostedt (Google) 
> ---
>
>  [ Resending to the correct mailing list this time :-p ]
>
>  include/linux/trace_events.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 12f875e9e69a..21ae37e49319 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -62,13 +62,13 @@ void trace_event_printf(struct trace_iterator *iter, 
> const char *fmt, ...);
>  /* Used to find the offset and length of dynamic fields in trace events */
>  struct trace_dynamic_info {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> - u16 offset;
>   u16 len;
> + u16 offset;
>  #else
> - u16 len;
>   u16 offset;
> + u16 len;
>  #endif
> -};
> +} __packed;
>  
>  /*
>   * The trace entry - the most basic unit of tracing. This is what

This issue was also present on BE, but as you noted "covered" by the
broken test case. With this patch everything works as expected. So:

Tested-by: Sven Schnelle 



Re: [PATCH] selftests/nolibc: libc-test: avoid -Wstringop-overflow warnings

2023-09-10 Thread Willy Tarreau
Hi Thomas,

On Sun, Sep 10, 2023 at 09:29:01PM +0200, Thomas Weißschuh wrote:
> Newer versions of glibc annotate the poll() function with
> __attribute__(access) which triggers a compiler warning inside the
> testcase poll_fault.
> Avoid this by using a plain NULL which is enough for the testcase.
> To avoid potential future warnings also adapt the other EFAULT
> testcases, except select_fault as NULL is a valid value for its
> argument.
(...)

Looks good to me. I wouldn't be surprised if we're soon forced to do
the same with select() on some archs where it might be emulated.

Feel free to push it to the shared repo.

Thanks!
Willy


Re: [PATCH] selftests: ALSA: remove unused variables

2023-09-10 Thread Takashi Iwai
On Fri, 08 Sep 2023 10:10:40 +0200,
Ding Xiang wrote:
> 
> These variables are never referenced in the code, just remove them.
> 
> Signed-off-by: Ding Xiang 

Thanks, applied.


Takashi