RE: [PATCH v6 1/6] iommu: Add cache_invalidate_user op

2023-11-19 Thread Tian, Kevin
> From: Yi Liu 
> Sent: Friday, November 17, 2023 9:07 PM
> 
> +/**
> + * struct iommu_user_data_array - iommu driver specific user space data
> array
> + * @type: The data type of all the entries in the user buffer array
> + * @uptr: Pointer to the user buffer array for copy_from_user()

remove "for copy_from_user()"

> + * @entry_len: The fixed-width length of a entry in the array, in bytes

s/a/an/

> + * @entry_num: The number of total entries in the array
> + *
> + * A array having a @entry_num number of @entry_len sized entries, each
> entry is
> + * user space data, an uAPI defined in include/uapi/linux/iommufd.h where
> @type
> + * is also defined as enum iommu_xyz_data_type.

* The user buffer array has @entry_num entries, each with a fixed size
 * @entry_len. The entry format and related type are defined in
 * include/uapi/linux/iommufd.h



Re: [PATCH 2/3] MAINTAINERS: Require kvm-xfstests smoke for ext4

2023-11-19 Thread Theodore Ts'o
On Fri, Nov 17, 2023 at 12:39:56PM +0530, Chandan Babu R wrote:
> IMHO, For XFS, The value of "V" field should refer to xfstests rather than a
> framework built around xfstests. This is because xfstests project contains the
> actual tests and also we could have several frameworks (e.g. Kdevops) for
> running xfstests.

For ext4, what I plan to do is to start requiring, in a soon-to-be
written:

Documentation/process/maintainer-ext4.rst

that *all* patches (exempting spelling/grammer nits which only touch
comments and result in zero changes in the compiled .o file) will
require running the xfstests smoke test.  The prooblem is that for
newbie patch submitters, and for "drive-by" patches from, say, mm
developers, installing and configuring xfstests, and then figuring out
how to set up all of the config files, and/or environments variables,
before you get to running "./check -g smoke"P, is really f**king hard.

As far as other test frameworks are concerned, I suggest that you ask
a new college grad that your company has just hired, or a new intern,
or a new GSOC or Outreachy intern, to apply a patch to the upstream
linux tree --- given only the framework's documentation, and with
***no*** help from anyone who knows how to use the framework.  Just
sit on your hands, and try as best you can to keep your mouth
shut and wait.

I spent a lot of work to make the instructures here:

  
https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

easy enough that it meets this critera.  What should be in the V:
field for the MAINTAINERS field is less clear to me, because it's not
clear whether we want it to be Dead Stupid Easy for anyone capable of
creating a kernel patch can figure out how to run the tests.

My personal opinion is that for someone who running through the Kernel
Newbies' My First Kernel patch, to say, doing something trivial such
as changing "(a < b) ? a : b" to "min(a,b)" and then being able
compile kernel, and then be able to say, "it builds, ship it", and
then following the kernel documentation to send a patch upstream ---
the documentation and procedure necessary to do something better than
"it builds, ship it!" by that Kernel Newbie.

It's also my considered opinion that neither bare, upstream xfstests,
*nor* kdevops meets this criteria.  For example, kdevops may claim
that you only need a few commands:

 "make menuconfig"
 "make"
 "make bringup"
 "make linux"
 "make fstests"
 "make fstests-baseline"
 "make fstests-results"

... but to be honest, I never got past the 2nd step before *I* got
massively confused and decided there was better things to do with my
time.  First, the "make menuconfig" requires that you answer a whole
bunch of questions, and it's not clear how you are supposed to answer
them all.  Secondly "make" issues a huge number of cmomands, and then
fails because I didn't run them as root.  But I won't download random
git repositories and "make" as root.  It goes against the grain.  And
so after trying to figure out what it did, and whether or not it was
safe, I ultimetly gave up on it.

For upstream xfstests, ask a New College Grad fresh hire, or intern,
to read xfstests's README file, and ask them to set up xfstests,
without help.  Go ahead, I'll wait

No doubt for someone who is willing to make a huge investment in time
to become a file system developer specializing in that subsystem, you
will eventually be able to figure it out.  And in the case of kdevops,
eventually I'd could set up my own VM, and install a kernel compile
toolchian, and all of kdevops's prequisits, and then run "make" and
"make linux" in that VM.  But that's a lot more than just "four
commands".

So as for *me*, I'm going to point people at:

https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

because I've simplified a lot of things, up to and including have
kvm-xfstests automatically download the test appliance VM image from
kernel.org if necessary.  When it lists the handful of commands that
need to be run, it includes downloading all of the prequisit packages,
and there are no complex menu configuration steps, and doesn't require
running random make processes of Makefile and Makefile fragments as
root.

(And note that I keep the xfstests-bld repo's on kernel.org and
github.com both uptodate, and I prefer using the using the github.com
URL because it's easier for the new developer to read and understand
it.)

Ultimately, at least for my planned
Documentation/process/maintainer-ext4.rst, before I could make running
a smoke test mandatory, I needed to make sure that the quickstart
documentation for kvm-xfstests be made as simple and as fool-proof as
possible.  That was extremely important to me from a personal point
of view.

As far as what should go into Documentation/process/tests.rst, for
"kvm-xfstests smoke" this may depend on whether other file system
maintainers want to adopt something which is similarly sim

Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

2023-11-19 Thread Song Liu
On Sun, Nov 19, 2023 at 12:03 AM Akihiko Odaki  wrote:
>
[...]
>
> Unfortunately no. The communication with the userspace can be done with
> two different means:
> - usual socket read/write
> - vhost for direct interaction with a KVM guest
>
> The BPF map may be a valid option for socket read/write, but it is not
> for vhost. In-kernel vhost may fetch hash from the BPF map, but I guess
> it's not a standard way to have an interaction between the kernel code
> and a BPF program.

I am very new to areas like vhost and KVM. So I don't really follow.
Does this mean we have the guest kernel reading data from host eBPF
programs (loaded by Qemu)?

> >
> >>
> >> Unfortunately, however, it is not acceptable for the BPF subsystem
> >> because the "stable" BPF is completely fixed these days. The
> >> "unstable/kfunc" BPF is an alternative, but the eBPF program will be
> >> shipped with a portable userspace program (QEMU)[1] so the lack of
> >> interface stability is not tolerable.
> >
> > bpf kfuncs are as stable as exported symbols. Is exported symbols
> > like stability enough for the use case? (I would assume yes.)
> >
> >>
> >> Another option is to hardcode the algorithm that was conventionally
> >> implemented with eBPF steering program in the kernel[2]. It is possible
> >> because the algorithm strictly follows the virtio-net specification[3].
> >> However, there are proposals to add different algorithms to the
> >> specification[4], and hardcoding the algorithm to the kernel will
> >> require to add more UAPIs and code each time such a specification change
> >> happens, which is not good for tuntap.
> >
> > The requirement looks similar to hid-bpf. Could you explain why that
> > model is not enough? HID also requires some stability AFAICT.
>
> I have little knowledge with hid-bpf, but I assume it is more like a
> "safe" kernel module; in my understanding, it affects the system state
> and is intended to be loaded with some kind of a system daemon. It is
> fine to have the same lifecycle with the kernel for such a BPF program;
> whenever the kernel is updated, the distributor can recompile the BPF
> program with the new kernel headers and ship it along with the kernel
> just as like a kernel module.
>
> In contrast, our intended use case is more like a normal application.
> So, for example, a user may download a container and run QEMU (including
> the BPF program) installed in the container. As such, it is nice if the
> ABI is stable across kernel releases, but it is not guaranteed for
> kfuncs. Such a use case is already covered with the eBPF steering
> program so I want to maintain it if possible.

TBH, I don't think stability should be a concern for kfuncs used by QEMU.
Many core BPF APIs are now implemented as kfuncs: bpf_dynptr_*,
bpf_rcu_*, etc. As long as there are valid use cases,these kfuncs will
be supported.

Thanks,
Song



[PATCH v2] Kunit to check the longest symbol length

2023-11-19 Thread Sergio González Collado
The longest length of a symbol (KSYM_NAME_LEN) was increased to 512
in the reference [1]. This patch adds a kunit test to check the longest
symbol length.

[1] https://lore.kernel.org/lkml/20220802015052.10452-6-oj...@kernel.org/

Tested-by: Martin Rodriguez Reboredo 
Signed-off-by: Sergio González Collado 
- - -
V1 -> V2: corrected CI tests. Added fix proposed at [2]

[2] 
https://lore.kernel.org/lkml/y9es4ukl%2f+dtv...@gmail.com/T/#m3ef0e12bb834d01ed1ebdcae12ef5f2add342077
---
 arch/x86/tools/insn_decoder_test.c |   3 +-
 lib/Kconfig.debug  |   9 +++
 lib/Makefile   |   2 +
 lib/longest_symbol_kunit.c | 122 +
 4 files changed, 135 insertions(+), 1 deletion(-)
 create mode 100644 lib/longest_symbol_kunit.c

diff --git a/arch/x86/tools/insn_decoder_test.c 
b/arch/x86/tools/insn_decoder_test.c
index 472540aeabc2..6c2986d2ad11 100644
--- a/arch/x86/tools/insn_decoder_test.c
+++ b/arch/x86/tools/insn_decoder_test.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define unlikely(cond) (cond)
 
@@ -106,7 +107,7 @@ static void parse_args(int argc, char **argv)
}
 }
 
-#define BUFSIZE 256
+#define BUFSIZE (256 + KSYM_NAME_LEN)
 
 int main(int argc, char **argv)
 {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cc7d53d9dc01..a531abece0a7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2769,6 +2769,15 @@ config FORTIFY_KUNIT_TEST
  by the str*() and mem*() family of functions. For testing runtime
  traps of FORTIFY_SOURCE, see LKDTM's "FORTIFY_*" tests.
 
+config LONGEST_SYM_KUNIT_TEST
+   tristate "Test the longest symbol possible" if !KUNIT_ALL_TESTS
+   depends on KUNIT && KPROBES
+   default KUNIT_ALL_TESTS
+   help
+ Tests the longest symbol possible
+
+ If unsure, say N.
+
 config HW_BREAKPOINT_KUNIT_TEST
bool "Test hw_breakpoint constraints accounting" if !KUNIT_ALL_TESTS
depends on HAVE_HW_BREAKPOINT
diff --git a/lib/Makefile b/lib/Makefile
index 6b09731d8e61..f72003d5869b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -406,6 +406,8 @@ obj-$(CONFIG_FORTIFY_KUNIT_TEST) += fortify_kunit.o
 obj-$(CONFIG_STRCAT_KUNIT_TEST) += strcat_kunit.o
 obj-$(CONFIG_STRSCPY_KUNIT_TEST) += strscpy_kunit.o
 obj-$(CONFIG_SIPHASH_KUNIT_TEST) += siphash_kunit.o
+obj-$(CONFIG_LONGEST_SYM_KUNIT_TEST) += longest_symbol_kunit.o
+CFLAGS_longest_symbol_kunit.o += $(call cc-disable-warning, missing-prototypes)
 
 obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o
 
diff --git a/lib/longest_symbol_kunit.c b/lib/longest_symbol_kunit.c
new file mode 100644
index ..998563018f7a
--- /dev/null
+++ b/lib/longest_symbol_kunit.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test the longest symbol length. Execute with:
+ *  ./tools/testing/kunit/kunit.py run longest-symbol
+ *  --arch=x86_64 --kconfig_add CONFIG_KPROBES=y --kconfig_add CONFIG_MODULES=y
+ *  --kconfig_add CONFIG_RETPOLINE=n
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+
+#define DI(name) s##name##name
+#define DDI(name) DI(n##name##name)
+#define DDDI(name) DDI(n##name##name)
+#define I(name) DDDI(n##name##name)
+#define DI(name) I(n##name##name)
+
+#define PLUS1(name) __PASTE(name, e)
+
+/*Generate a symbol whose name length is 511 */
+#define LONGEST_SYM_NAME  DI(g1h2i3j4k5l6m7n)
+
+/*Generate a symbol whose name length is 512 */
+#define LONGEST_SYM_NAME_PLUS1 PLUS1(LONGEST_SYM_NAME)
+
+#define RETURN_LONGEST_SYM 0xA
+#define RETURN_LONGEST_SYM_PLUS1 0x5
+
+noinline int LONGEST_SYM_NAME(void);
+noinline int LONGEST_SYM_NAME(void)
+{
+   return RETURN_LONGEST_SYM;
+}
+
+noinline int LONGEST_SYM_NAME_PLUS1(void);
+noinline int LONGEST_SYM_NAME_PLUS1(void)
+{
+   return RETURN_LONGEST_SYM_PLUS1;
+}
+
+_Static_assert(sizeof(__stringify(LONGEST_SYM_NAME)) == KSYM_NAME_LEN,
+"Incorrect symbol length found. Expected KSYM_NAME_LEN: "
+__stringify(KSYM_NAME) ", but found: "
+__stringify(sizeof(LONGEST_SYM_NAME)));
+
+static void test_longest_symbol(struct kunit *test)
+{
+   KUNIT_EXPECT_EQ(test, RETURN_LONGEST_SYM, LONGEST_SYM_NAME());
+};
+
+static void test_longest_symbol_kallsyms(struct kunit *test)
+{
+   unsigned long (*kallsyms_lookup_name)(const char *name);
+   static int (*longest_sym)(void);
+
+   struct kprobe kp = {
+   .symbol_name = "kallsyms_lookup_name",
+   };
+
+   if (register_kprobe(&kp) < 0) {
+   pr_info("%s: kprobe not registered\n", __func__);
+   KUNIT_FAIL(test, "test_longest_symbol kallsysms: kprobe not 
registered\n");
+   return;
+   }
+
+   kunit_warn(test, "test_longest_symbol kallsyms: kprobe registered\n");
+   kallsyms_lookup_name = (unsigned long (*)(const char *name))kp.addr;
+   unregister_kprobe(&kp);
+
+   longest_sym =
+   (void *) kalls

[PATCH 2/2] mm/damon/core-test: test damon_split_region_at()'s access rate copying

2023-11-19 Thread SeongJae Park
damon_split_region_at() should set access rate related fields of the
resulting regions same.  It may forgotten, and actually there was the
mistake before.  Test it with the unit test case for the function.

Signed-off-by: SeongJae Park 
---
 mm/damon/core-test.h | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h
index 649adf91ebc5..e6a01ea2ec54 100644
--- a/mm/damon/core-test.h
+++ b/mm/damon/core-test.h
@@ -122,18 +122,25 @@ static void damon_test_split_at(struct kunit *test)
 {
struct damon_ctx *c = damon_new_ctx();
struct damon_target *t;
-   struct damon_region *r;
+   struct damon_region *r, *r_new;
 
t = damon_new_target();
r = damon_new_region(0, 100);
+   r->nr_accesses_bp = 42;
+   r->nr_accesses = 42;
+   r->last_nr_accesses = 15;
damon_add_region(r, t);
damon_split_region_at(t, r, 25);
KUNIT_EXPECT_EQ(test, r->ar.start, 0ul);
KUNIT_EXPECT_EQ(test, r->ar.end, 25ul);
 
-   r = damon_next_region(r);
-   KUNIT_EXPECT_EQ(test, r->ar.start, 25ul);
-   KUNIT_EXPECT_EQ(test, r->ar.end, 100ul);
+   r_new = damon_next_region(r);
+   KUNIT_EXPECT_EQ(test, r_new->ar.start, 25ul);
+   KUNIT_EXPECT_EQ(test, r_new->ar.end, 100ul);
+
+   KUNIT_EXPECT_EQ(test, r->nr_accesses_bp, r_new->nr_accesses_bp);
+   KUNIT_EXPECT_EQ(test, r->nr_accesses, r_new->nr_accesses);
+   KUNIT_EXPECT_EQ(test, r->last_nr_accesses, r_new->last_nr_accesses);
 
damon_free_target(t);
damon_destroy_ctx(c);
-- 
2.34.1




Re: [PATCH v4 0/5] cgroup/cpuset: Improve CPU isolation in isolated partitions

2023-11-19 Thread Tejun Heo
On Wed, Nov 15, 2023 at 10:34:00PM -0500, Waiman Long wrote:
> v4:
>  - Update patch 1 to move apply_wqattrs_lock() and apply_wqattrs_unlock()
>down into CONFIG_SYSFS block to avoid compilation warnings.

I already applied v3 to cgroup/for-6.8. Can you please send the fix up patch
against that branch?

Thanks.

-- 
tejun



Re: [PATCH v5 0/6] workload-specific and memory pressure-driven zswap writeback

2023-11-19 Thread Chris Li
On Fri, Nov 17, 2023 at 8:23 AM Nhat Pham  wrote:
>
> On Thu, Nov 16, 2023 at 4:57 PM Chris Li  wrote:
> >
> > Hi Nhat,
> >
> > I want want to share the high level feedback we discussed here in the
> > mailing list as well.
> >
> > It is my observation that each memcg LRU list can't compare the page
> > time order with other memcg.
> > It works great when the leaf level memcg hits the memory limit and you
> > want to reclaim from that memcg.
> > It works less well on the global memory pressure you need to reclaim
> > from all memcg. You kind of have to
> > scan each all child memcg to find out the best page to shrink from. It
> > is less effective to get to the most desirable page quickly.
> >
> > This can benefit from a design similar to MGLRU. This idea is
> > suggested by Yu Zhao, credit goes to him not me.
> > In other words, the current patch is similar to the memcg page list
> > pre MGLRU world. We can have a MRLRU
> > like per memcg zswap shrink list.
>
> I was gonna summarize the points myself :P But thanks for doing this.
> It's your idea so you're more qualified to explain this anyway ;)

The MGLRU like shrinker was Zhao Yu's idea. I just observe the problem.

>
> I absolutely agree that having a generation-aware cgroup-aware
> NUMA-aware LRU is the future way to go. Currently, IIUC, the reclaim logic
> selects cgroups in a round-robin-ish manner. It's "fair" in this perspective,
> but I also think it's not ideal. As we have discussed, the current list_lru
> infrastructure only take into account intra-cgroup relative recency, not
> inter-cgroup relative recency. The recently proposed time-based zswap
> reclaim mechanism will provide us with a source of information, but the
> overhead of using this might be too high - and it's very zswap-specific.

I don't mind it is zswap-specific, as long as it is effective.
The overhead has two folds:
1) memory overhead on storing timestamps on per compressed page.
2) cpu overhead for reading timestamps.
Using MGLRU likely have advantage over time stamps on both memory and
cpu. The generation can use fewer bits and doesn't require reading
time on every page.

> Maybe after this, we should improve zswap reclaim (and perhaps all
> list_lru users) by adding generations to list_lru then take generations
> into account in the vmscan code. This patch series could be merged

One high level idea is that we can get the page generation in the
MGLRU before it gets into zswap. Just retain the generation into the
zpool LRU somehow.

> as-is, and once we make list_lru generation-aware, zswap shrinker
> will automagically be improved (along with all other list_lru/shrinker
> users).

I don't think it will automatically improve, you will need to rewrite
a lot of code in the shrinker as well to best use MGLRU zpool.

>
> I don't know enough about the current design of MGLRU to comment
> too much further, but let me know if this makes sense, and if you have
> objections/other ideas.

Taking the step by step approach is fine by me as long as we are
making steady progress towards the better end goal.

>
> And if you have other documentations for MGLRU than its code, could
> you please let me know? I'm struggling to find more details about this.

I would need to learn MGLRU myself. We can share and compare notes
when we get to it.

Chris



Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

2023-11-19 Thread Akihiko Odaki

On 2023/11/19 1:08, Song Liu wrote:

Hi,

A few rookie questions below.


Thanks for questions.



On Sat, Nov 18, 2023 at 2:39 AM Akihiko Odaki  wrote:


On 2023/10/18 4:19, Akihiko Odaki wrote:

On 2023/10/18 4:03, Alexei Starovoitov wrote:

[...]


I would also appreciate if you have some documentation or link to
relevant discussions on the mailing list. That will avoid having same
discussion you may already have done in the past.


Hi,

The discussion has been stuck for a month, but I'd still like to
continue figuring out the way best for the whole kernel to implement
this feature. I summarize the current situation and question that needs
to be answered before push this forward:

The goal of this RFC is to allow to report hash values calculated with
eBPF steering program. It's essentially just to report 4 bytes from the
kernel to the userspace.


AFAICT, the proposed design is to have BPF generate some data
(namely hash, but could be anything afaict) and consume it from
user space. Instead of updating __sk_buff, can we have the user
space to fetch the data/hash from a bpf map? If this is an option,
I guess we can implement the same feature with BPF tracing
programs?


Unfortunately no. The communication with the userspace can be done with 
two different means:

- usual socket read/write
- vhost for direct interaction with a KVM guest

The BPF map may be a valid option for socket read/write, but it is not 
for vhost. In-kernel vhost may fetch hash from the BPF map, but I guess 
it's not a standard way to have an interaction between the kernel code 
and a BPF program.






Unfortunately, however, it is not acceptable for the BPF subsystem
because the "stable" BPF is completely fixed these days. The
"unstable/kfunc" BPF is an alternative, but the eBPF program will be
shipped with a portable userspace program (QEMU)[1] so the lack of
interface stability is not tolerable.


bpf kfuncs are as stable as exported symbols. Is exported symbols
like stability enough for the use case? (I would assume yes.)



Another option is to hardcode the algorithm that was conventionally
implemented with eBPF steering program in the kernel[2]. It is possible
because the algorithm strictly follows the virtio-net specification[3].
However, there are proposals to add different algorithms to the
specification[4], and hardcoding the algorithm to the kernel will
require to add more UAPIs and code each time such a specification change
happens, which is not good for tuntap.


The requirement looks similar to hid-bpf. Could you explain why that
model is not enough? HID also requires some stability AFAICT.


I have little knowledge with hid-bpf, but I assume it is more like a 
"safe" kernel module; in my understanding, it affects the system state 
and is intended to be loaded with some kind of a system daemon. It is 
fine to have the same lifecycle with the kernel for such a BPF program; 
whenever the kernel is updated, the distributor can recompile the BPF 
program with the new kernel headers and ship it along with the kernel 
just as like a kernel module.


In contrast, our intended use case is more like a normal application. 
So, for example, a user may download a container and run QEMU (including 
the BPF program) installed in the container. As such, it is nice if the 
ABI is stable across kernel releases, but it is not guaranteed for 
kfuncs. Such a use case is already covered with the eBPF steering 
program so I want to maintain it if possible.


Regards,
Akihiko Odaki