from:"Luis Chamberlain"

Re: [PATCH] rust: add `module_params` macro

2024-07-18 Thread Luis Chamberlain

On Tue, Jul 09, 2024 at 12:08:16PM +0200, Miguel Ojeda wrote:
> On Mon, Jul 8, 2024 at 11:42 PM Luis Chamberlain  wrote:
> >
> > The rationale here is that a rust binding means commitment then also
> > from fresh blood to help co-maintain review C / Rust for exising code
> > when there is will / desire to collaborate from an existing C maintainer.
> >
> > I realize this may be a lot to ask, but I think this is one of the
> > responsible ways to ask to scale here.
> 
> But, yes, I think Rust is a great opportunity to get new
> co-maintainers, as well as getting new developers involved with kernel
> maintenance in general, which could help with other issues too.

Great well then my preference is to not have Rust bindings for modules
unless the Rust community can commit to not only a co-maintianer for
both C And Rust but also commit to not ditching the role; if a C/Rust
co-maintainer gets hits by a bus the Rust community would strive to
look for someone else to step in. This would proactively help with
upstream responsibilities understood by companies who hire developers
in this context. It is why I brought up Andreas's work, I already know
he has a lot of work to do and responsibilities. If not Andreas, who else
can step up to help with this, Sami? While each company varies in
accepting a developer's roles in the community, I think we would stand
to gain to consider the long term aspects of this before it becomes an
issue, so we get employers to understand / accept this as part of our
work. I don't think this is an unreasonable for companies or developers
interested in Rust advancements.

This includes testing, helping improve tests and using existing tests
or automation tools for them so we don't regress.

Clearly, this isn't just about a module_params macro, for example
I'm starting to see other module related code I need to review and
having to be very careful to ensure all of what is ongoing with modules
like Kris's work on kbuild CONFIG_BUILTIN_MODULE_RANGES will still work
in a Rust modules world with Sami's work on module modversions.

[0] https://lkml.kernel.org/r/20240716031045.1781332-1-kris.van.h...@oracle.com

  Luis

Re: [PATCH] rust: add `module_params` macro

2024-07-08 Thread Luis Chamberlain

On Fri, Jul 05, 2024 at 11:15:11AM +, Andreas Hindborg wrote:
> From: Andreas Hindborg 
> 
> This patch includes changes required for Rust kernel modules to utilize
> module parameters. This code implements read only support for integer
> types without `sysfs` support.
> 
> This code is a reduced and updated version of code by Adam available in the
> original `rust` branch [1].
> 
> [1] 
> https://github.com/Rust-for-Linux/linux/tree/bc22545f38d74473cfef3e9fd65432733435b79f
> 
> Cc: Adam Bratschi-Kaye 
> Signed-off-by: Andreas Hindborg 
> 

This poses an interesting challenge I'd like to take up with the Rust
community. I'm fine with Rust bindings, however many C maintainers
neither don't speak / write Rust, and in many cases many don't even want
to touch Rust at all. In my case I want to get there.. but just haven't
had time yet. So we have to live with that world. But to help with a
Rust world, clearly we need to allow for some Rust bindings.

As a compromise, I recently suggested for example for the firmware_loader
Rust bindings to be acceptable if and only if we could get the developer
doing those changes to be willing to commit to also being *both* a C and
rust maintainer for the firmware loader.

I'm starting to feel the same way about modules, but modules requires
more work than the firmware loader. And since I also know Andreas has
already a lot on his plate, I'm at a cross roads.  My above request for
the firmware loader made sense to the person working on the firmware
loader changes, but who would help on the modules side of things? And
does this request make sense to help scale?

The rationale here is that a rust binding means commitment then also
from fresh blood to help co-maintain review C / Rust for exising code
when there is will / desire to collaborate from an existing C maintainer.

I realize this may be a lot to ask, but I think this is one of the
responsible ways to ask to scale here.

  Luis

Re: [PATCH] kallsyms, livepatch: Fix livepatch with CONFIG_LTO_CLANG

2024-07-08 Thread Luis Chamberlain

On Thu, Jul 04, 2024 at 11:02:18AM +0200, Petr Mladek wrote:
> On Wed 2024-07-03 08:30:33, Luis Chamberlain wrote:
> > On Tue, Jul 02, 2024 at 10:56:41PM -0700, Josh Poimboeuf wrote:
> > > On Mon, Jul 01, 2024 at 03:13:23PM +0200, Petr Mladek wrote:
> > > > So, you suggest to search the symbols by a hash. Do I get it correctly?
> > 
> > I meant, that in the Rust world the symbols go over the allowed limit,
> > and so an alternative for them is to just use a hash. What I'm
> > suggesting is for a new kconfig option where that world is the
> > new one, so that they have to also do the proper userspace tooling
> > for it. Without that, I don't see it as properly tested or scalable.
> > And if we're gonna have that option for Rust for modules, then it begs
> > the question if this can be used by other users.
> 
> I am still not sure at which level the symbol names would get hashed ;-)

Let's clarify that. The Rust world mangles symbols to be very long. In
order to allow external modules built to be usable in other systems we support
modversioning. To help with this each module has an array of crcs of each
symbol it depends on. This is added to the module.mod.c so for my built
xfs.mod.c I have for example:

static const struct modversion_info versions[]
__used __section("__versions") = {
{ 0x5b8239ca, "__x86_return_thunk" },
{ 0x93a6e0b2, "io_schedule" }, 
{ 0x6383b27c, "__x86_indirect_thunk_rdx" },
{ 0xc5b6f236, "queue_work_on" },
{ 0xd966a5ea, "list_lru_walk_one" },
...
{ 0xe3565824, "vfs_readlink" },
{ 0xaed0dc33, "bdev_freeze" },
{ 0x7ce18c9f, "from_kqid" },
{ 0xd36dc10c, "get_random_u32" },
{ 0xba4edeaf, "__percpu_down_read" },
{ 0xb9f07b78, "file_modified" },
{ 0x48755bd1, "module_layout"
},
};

If a module depends on a symbol from a Rust module or Rust built-in
code, it cannot fit becuase we currenlty define:

struct modversion_info {

unsigned long crc;  

char name[MODULE_NAME_LEN]; 

};  

Although perhaps this started off as thing for module versions only,
clearly we're using it for all symbols a module uses so MODULE_NAME_LEN
really was in the end silly and does not suffice today.

A version of the effort from Rust folks last year tried to modify the
length but its claimed that the blocker for that was that userspace
would need to change, so their new attempt tried to work around that
using hashes...

Looking at this again its not to me why Masahiro Yamada's suggestion on
that old patch series to just increase the length and put long symbols
names into its own section [0] could not be embraced with a new kconfig
option, so new kernels and new userspace could support it:

https://lore.kernel.org/lkml/CAK7LNATsuszFR7JB5ZkqVS1W=hWr9=e7btf+mvgj+nxt3az...@mail.gmail.com/

> The symbols names are used in many situations, e.g. backtraces,
> crashdump, objdump, nm, gdb, tracing, livepatching, kprobes, ...
> 
> Would kallsyms provide some translation table between the usual
> "long" symbol name and a hash?
> 
> Would it allows to search the symbols both ways?

Let's review the scope of both issues first, certainly Rust's issue is
limitted in scope, but since a new userspace change might be required
its worth reviewing if it may help the CONFIG_LTO_CLANG case as well.

> I am a bit scared because using hashed symbol names in backtraces, gdb,
> ... would be a nightmare. Hashes are not human readable and
> they would complicate the life a lot. And using different names
> in different interfaces would complicate the life either.

All great points.

The scope of the Rust issue is self contained to modversion_info,
whereas for CONFIG_LTO_CLANG issue commit 8b8e6b5d3b013b0
("kallsyms: strip ThinLTO hashes from static functions") describes
the issue with userspace tools (it doesn't explain which ones)
which don't expect the function name to change. This seems to happen
to static routines so I can only suspect this isn't an issue with
modversioning as the only symbols that would be used there wouldn't be
static.

Sami, what was the exact userspace issue with CONFIG_LTO_CLANG and these
long symbols?

  Luis

Re: [PATCH] kallsyms, livepatch: Fix livepatch with CONFIG_LTO_CLANG

2024-07-03 Thread Luis Chamberlain

On Tue, Jul 02, 2024 at 10:56:41PM -0700, Josh Poimboeuf wrote:
> On Mon, Jul 01, 2024 at 03:13:23PM +0200, Petr Mladek wrote:
> > So, you suggest to search the symbols by a hash. Do I get it correctly?

I meant, that in the Rust world the symbols go over the allowed limit,
and so an alternative for them is to just use a hash. What I'm
suggesting is for a new kconfig option where that world is the
new one, so that they have to also do the proper userspace tooling
for it. Without that, I don't see it as properly tested or scalable.
And if we're gonna have that option for Rust for modules, then it begs
the question if this can be used by other users.

> > Well, it might bring back the original problem. I mean
> > the commit 8b8e6b5d3b013b0 ("kallsyms: strip ThinLTO hashes from
> > static functions") added cleanup_symbol_name() so that user-space
> > tool do not need to take care of the "unstable" suffix.
> 
> Are symbol names really considered user ABI??  That's already broken by
> design.  Even without LTO, the toolchain can mangle them for a variety
> of reasons.
> 
> If a user space tool doesn't want the suffixes, surely it can figure out
> a way to deal with that on their own?
> 
> > So, it seems that we have two use cases:
> > 
> >1. Some user-space tools want to ignore the extra suffix. I guess
> >   that it is in the case when the suffix is added only because
> >   the function was optimized.
> > 
> >   It can't work if there are two different functions of the same
> >   name. Otherwise, the user-space tool would not know which one
> >   they are tracing.
> > 
> > 
> >2. There are other use-cases, including livepatching, where we
> >   want to be 100% sure that we match the right symbol.
> > 
> >   They want to match the full names. They even need to distinguish
> >   symbols with the same name.
> > 
> > 
> > IMHO, we need a separate API for each use-case.
> 
> We should just always link with -zunique-symbols so the duplicate
> symbols no longer exist.  That would solve a lot of problems.

While it might solve this other issue, it doesn't solve the rust module
long symbol name issue.

  Luis

Re: [PATCH] kallsyms, livepatch: Fix livepatch with CONFIG_LTO_CLANG

2024-06-28 Thread Luis Chamberlain

On Fri, Jun 28, 2024 at 02:23:49PM +0200, Miroslav Benes wrote:
> On Fri, 7 Jun 2024, Song Liu wrote:
> 
> > Hi Miroslav,
> > 
> > Thanks for reviewing the patch!
> > 
> > On Fri, Jun 7, 2024 at 6:06 AM Miroslav Benes  wrote:
> > >
> > > Hi,
> > >
> > > On Tue, 4 Jun 2024, Song Liu wrote:
> > >
> > > > With CONFIG_LTO_CLANG, the compiler may postfix symbols with 
> > > > .llvm.
> > > > to avoid symbol duplication. scripts/kallsyms.c sorted the symbols
> > > > without these postfixes. The default symbol lookup also removes these
> > > > postfixes before comparing symbols.
> > > >
> > > > On the other hand, livepatch need to look up symbols with the full 
> > > > names.
> > > > However, calling kallsyms_on_each_match_symbol with full name (with the
> > > > postfix) cannot find the symbol(s). As a result, we cannot livepatch
> > > > kernel functions with .llvm. postfix or kernel functions that use
> > > > relocation information to symbols with .llvm. postfixes.
> > > >
> > > > Fix this by calling kallsyms_on_each_match_symbol without the postfix;
> > > > and then match the full name (with postfix) in klp_match_callback.
> > > >
> > > > Signed-off-by: Song Liu 
> > > > ---
> > > >  include/linux/kallsyms.h | 13 +
> > > >  kernel/kallsyms.c| 21 -
> > > >  kernel/livepatch/core.c  | 32 +++-
> > > >  3 files changed, 60 insertions(+), 6 deletions(-)
> > >
> > > I do not like much that something which seems to be kallsyms-internal is
> > > leaked out. You need to export cleanup_symbol_name() and there is now a
> > > lot of code outside. I would feel much more comfortable if it is all
> > > hidden from kallsyms users and kept there. Would it be possible?
> > 
> > I think it is possible. Currently, kallsyms_on_each_match_symbol matches
> > symbols without the postfix. We can add a variation or a parameter, so
> > that it matches the full name with post fix.
> 
> I think it might be better.
> 
> Luis, what is your take on this?
> 
> If I am not mistaken, there was a patch set to address this. Luis might 
> remember more.

Yeah this is a real issue outside of CONFIG_LTO_CLANG, Rust modules is
another example where instead of symbol names they want to use full
hashes. So, as I hinted to you Sami, can we knock two birds with one stone
here and move CONFIG_LTO_CLANG to use the same strategy as Rust so we
have two users instead of just one? Then we resolve this. In fact
what I suggested was even to allow even non-Rust, and in this case
even with gcc to enable this world. This gives much more wider scope
of testing / review / impact of these sorts of changes and world view
and it would resolve the Rust case, the live patch CONFIG_LTO_CLANG world too.

Thoughts?

  Luis

Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-28 Thread Luis Chamberlain

On Fri, Jun 21, 2024 at 04:05:27PM +0200, Daniel von Kirschten wrote:
> Am 18.06.2024 um 21:58 schrieb Luis Chamberlain:
> > On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote:
> > > If a module is being loaded, and the .gnu.linkonce.this_module section
> > > in the module's ELF file does not have the WRITE flag, the kernel will
> > > map the finished module struct of that module as read-only.
> > > This causes a kernel panic when the struct is written to the first time
> > > after it has been marked read-only. Currently this happens in
> > > complete_formation in kernel/module/main.c:2765 when the module's state is
> > > set to MODULE_STATE_COMING, just after setting up the memory protections.
> > 
> > How did you find this issue?
> 
> In a university course I got the assignment to manually craft a loadable .ko
> file, given only a regular object file, without using Kbuild. During testing
> my module files, most of them were simply (correctly) rejected by the kernel
> with an appropriate error message, but at some point I ran into this exact
> kernel panic, and investigated it to understand why my module file was
> invalid.

OK, then the commit log should describe that this doesn't fix any known
real world issue, but rather a custom crafted module without the regular
module build system.

> > > Down the line, this seems to lead to unpredictable freezes when trying to
> > > load other modules - I guess this is due to some structures not being
> > > cleaned up properly, but I didn't investigate this further.
> > > 
> > > A check already exists which verifies that .gnu.linkonce.this_module
> > > is ALLOC. This patch simply adds an analogous check for WRITE.
> > 
> > Can you check to ensure our modules generated have a respective check to
> > ensure this check exists at build time? That would proactively inform
> > userspace when a built module is not built correctly, and the tool
> > responsible can be identified.
> 
> See above - I don't think it's possible to create such a broken module file
> with any of "official" tools.

That should be clearly stated on the commit log.

> I haven't looked too deeply into how Kbuild
> actually builds modules, but as far as I know, the user doesn't even come
> into contact with this_module w

Consider that a next level university assignment and is more useful to the world
than this debug message. Because above you suggest "I don't think", go
out and now be sure.

> hen using the regular toolchain, because
> Kbuild is responsible for creating the .this_module section. And Kbuild of
> course creates it with the correct flags. So if I understand correctly,

...

> this
> problem can only occur when the module was built by some external tooling
> (or manually, in my case).

Who would create custom modules without the Linux kernel module build
system, and what uses does that provide? It seems you are proving why
this would be terribly silly thing to do.

Now, the *value* your change has is it can prevent a crash in case of a
corrupted module, which *can* occur, consider an odd filesystem
live corruption, at least this would be caught at module load attempt
and not crash. That's worth committing for this reason but your commit
log really needs much more clarity. Why? Because stupid bots want to
assign stupid CVEs for anything that seems like a security issue and
this could escalate to such type of things. Providing clarity helps
system integrators decide if they want to backport this sort of patch.
Providing clarify on the chances of this happening and how we think it
can happen helps a lot.

If you want to be more proactive, try to enhance userspace kmod modprobe
so that this is also verified.

  Luis

Re: [PATCH v3] module: Add log info for verifying module signature

2024-06-28 Thread Luis Chamberlain

On Fri, Jun 28, 2024 at 10:39:23AM +, Yusong Gao wrote:
> Add log information in kernel-space when loading module failures.
> Try to load the unsigned module and the module with bad signature
> when set 1 to /sys/module/module/parameters/sig_enforce.
> 
> Unsigned module case:
> (linux) insmod unsigned.ko
> [   18.714661] Loading of unsigned module is rejected
> insmod: can't insert 'unsigned.ko': Key was rejected by service
> (linux)
> 
> Bad signature module case:
> (linux) insmod bad_signature.ko
> insmod: can't insert 'bad_signature.ko': Key was rejected by service
> (linux)
> 
> There have different logging behavior the bad signature case only log
> in user-space, add log info for fatal errors in module_sig_check().
> 
> Signed-off-by: Yusong Gao 
> ---
> V3: Clarify the message type and the error code meaning.
> V2: Change print level from notice to debug.
> ---
>  kernel/module/signing.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/kernel/module/signing.c b/kernel/module/signing.c
> index a2ff4242e623..826cdab8e3e4 100644
> --- a/kernel/module/signing.c
> +++ b/kernel/module/signing.c
> @@ -67,6 +67,31 @@ int mod_verify_sig(const void *mod, struct load_info *info)
> NULL, NULL);
>  }
>  
> +static const char *mod_decode_error(int errno)
> +{
> + char *errstr = "Unrecognized error";

This is not safe. You can just extend the existing debug switch for
strict module loading and re-use the variable there and use that,
for example

diff --git a/kernel/module/signing.c b/kernel/module/signing.c
index a2ff4242e623..9111822116e6 100644
--- a/kernel/module/signing.c
+++ b/kernel/module/signing.c
@@ -106,6 +106,9 @@ int module_sig_check(struct load_info *info, int flags)
case -ENOKEY:
reason = "module with unavailable key";
break;
+   case -EKEYREJECTED:
+   reason = "Key was rejected by service";
+   break;
 
default:
/*
@@ -113,6 +116,7 @@ int module_sig_check(struct load_info *info, int flags)
 * unparseable signatures, and signature check failures --
 * even if signatures aren't required.
 */
+   pr_debug("Verifying module signature failed: %s\n", reason);
return err;
}

Also certs/system_keyring.c already has a lot of pr_devel stuff too, so
do we really need this?

  Luis

Re: [PATCH v2] module: Add log information for loading module failures

2024-06-20 Thread Luis Chamberlain

On Wed, Jun 19, 2024 at 08:30:37AM +, Yusong Gao wrote:
> Add log information in kernel-space when loading module failures.
> Try to load the unsigned module and the module with bad signature
> when set 1 to /sys/module/module/parameters/sig_enforce.
> 
> Unsigned module case:
> (linux) insmod unsigned.ko
> [   18.714661] Loading of unsigned module is rejected
> insmod: can't insert 'unsigned.ko': Key was rejected by service
> (linux)
> 
> Bad signature module case:
> (linux) insmod bad_signature.ko
> insmod: can't insert 'bad_signature.ko': Key was rejected by service
> (linux)
> 
> There have different logging behavior the bad signature case only log
> in user-space, add log info for fatal errors in module_sig_check().
> 
> Signed-off-by: Yusong Gao 
> ---
> V2: Change print level from notice to debug.
> ---
>  kernel/module/signing.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/module/signing.c b/kernel/module/signing.c
> index a2ff4242e623..b0c7b30e4d89 100644
> --- a/kernel/module/signing.c
> +++ b/kernel/module/signing.c
> @@ -113,6 +113,7 @@ int module_sig_check(struct load_info *info, int flags)
>* unparseable signatures, and signature check failures --
>* even if signatures aren't required.
>*/
> + pr_debug("Loading module failed (errno=%d)\n", -err);

There are so many debug message, this doesn't help explain why, ie, tha
this is a signature issue.

  Luis

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Luis Chamberlain

On Tue, Jun 18, 2024 at 02:19:47PM -0700, Sami Tolvanen wrote:
> Hi Luis,
> 
> On Tue, Jun 18, 2024 at 12:42:51PM -0700, Luis Chamberlain wrote:
> > a) Ensure correctness for all users / tools, so that proper plumbing is
> >really done. By considering all symbols you increase your scope of
> >awareness of anything that could really break.
> > 
> > b) Remove the "Rust" nature about this
> > 
> > c) Rust modules just becomes a *user* of this approach
> 
> I believe the only Rust nature here is the last patch that enables
> gendwarfksyms only for Rust. Otherwise, there shouldn't be anything
> strictly Rust specific.

Right now the check for length is generic, and assumes a hash may exist
without considering that check is moot for non-rust modules. So the
inverse is true, but doesn't provide a solution or proper architecture
for it.

> > It gets me wondering, given Kris is also working on allowing traces to
> > map symbols to module names, how does this fit into that world [0]?
> 
> AFAIK long symbol names are only a problem for struct modversion_info,
> which uses a relatively short name buffer, so I'm hoping other efforts
> won't be affected.

We'll see!

> > As for a) the reason I'm thinking about having the ability to test a
> > full real kernel and moules with this is, without that, how are you sure
> > you have the full scope of the changes needed?
> 
> Sure, I can look into hooking this up for the entire kernel and seeing
> what breaks, in addition the issues Masahiro pointed out, of course.

:) should be fun!

I think once its revised as a "generic" strategy and not a Rust one, the
proper architecture will be considered. Right now, it just looks like a
cheap band aid for Rust.

  Luis

Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-18 Thread Luis Chamberlain

On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote:
> If a module is being loaded, and the .gnu.linkonce.this_module section
> in the module's ELF file does not have the WRITE flag, the kernel will
> map the finished module struct of that module as read-only.
> This causes a kernel panic when the struct is written to the first time
> after it has been marked read-only. Currently this happens in
> complete_formation in kernel/module/main.c:2765 when the module's state is
> set to MODULE_STATE_COMING, just after setting up the memory protections.

How did you find this issue?

> Down the line, this seems to lead to unpredictable freezes when trying to
> load other modules - I guess this is due to some structures not being
> cleaned up properly, but I didn't investigate this further.
> 
> A check already exists which verifies that .gnu.linkonce.this_module
> is ALLOC. This patch simply adds an analogous check for WRITE.

Can you check to ensure our modules generated have a respective check to
ensure this check exists at build time? That would proactively inform
userspace when a built module is not built correctly, and the tool
responsible can be identified.

  Luis

Re: [PATCH v3] module: create weak dependecies

2024-06-18 Thread Luis Chamberlain

On Tue, May 14, 2024 at 09:25:55AM -0500, Lucas De Marchi wrote:
> On Fri, May 10, 2024 at 10:57:22AM GMT, Jose Ignacio Tornos Martinez wrote:
> > It has been seen that for some network mac drivers (i.e. lan78xx) the
> > related module for the phy is loaded dynamically depending on the current
> > hardware. In this case, the associated phy is read using mdio bus and then
> > the associated phy module is loaded during runtime (kernel function
> > phy_request_driver_module). However, no software dependency is defined, so
> > the user tools will no be able to get this dependency. For example, if
> > dracut is used and the hardware is present, lan78xx will be included but no
> > phy module will be added, and in the next restart the device will not work
> > from boot because no related phy will be found during initramfs stage.
> > 
> > In order to solve this, we could define a normal 'pre' software dependency
> > in lan78xx module with all the possible phy modules (there may be some),
> > but proceeding in that way, all the possible phy modules would be loaded
> > while only one is necessary.
> > 
> > The idea is to create a new type of dependency, that we are going to call
> > 'weak' to be used only by the user tools that need to detect this situation.
> > In that way, for example, dracut could check the 'weak' dependency of the
> > modules involved in order to install these dependencies in initramfs too.
> > That is, for the commented lan78xx module, defining the 'weak' dependency
> > with the possible phy modules list, only the necessary phy would be loaded
> > on demand keeping the same behavior, but all the possible phy modules would
> > be available from initramfs.
> > 
> > The 'weak' dependency support has been included in kmod:
> > https://github.com/kmod-project/kmod/commit/05828b4a6e9327a63ef94df544a042b5e9ce4fe7
> > But, take into account that this can only be used if depmod is new enough.
> > If it isn't, depmod will have the same behavior as always (keeping backward
> > compatibility) and the information for the 'weak' dependency will not be
> > provided.
> > 
> > Signed-off-by: Jose Ignacio Tornos Martinez 
> 
> 
> Reviewed-by: Lucas De Marchi 

Thanks! Applied and pushed to modules-next.

  Luis

Re: [PATCH 00/15] Implement MODVERSIONS for Rust

2024-06-18 Thread Luis Chamberlain

On Mon, Jun 17, 2024 at 05:58:19PM +, Sami Tolvanen wrote:
> Hi folks,
> 
> This series implements CONFIG_MODVERSIONS for Rust, an important
> feature for distributions like Android that want to ship Rust
> kernel modules, and depend on modversions to help ensure module ABI
> compatibility.
> 
> There have been earlier proposals [1][2] that would allow Rust
> modules to coexist with modversions, but none that actually implement
> symbol versioning. Unlike C, Rust source code doesn't have sufficient
> information about the final ABI, as the compiler has considerable
> freedom in adjusting structure layout for improved performance [3],
> for example, which makes using a source code parser like genksyms
> a non-starter. Based on Matt's suggestion and previous feedback
> from maintainers, this series uses DWARF debugging information for
> computing versions. DWARF is an established and relatively stable
> format, which includes all the necessary ABI details, and adding a
> CONFIG_DEBUG_INFO dependency for Rust symbol versioning seems like a
> reasonable trade-off.

OK sure.

> The first 12 patches of this series add a small tool for computing
> symbol versions from DWARF, called gendwarfksyms. When passed a list
> of exported symbols, the tool generates an expanded type string
> for each symbol, and computes symbol CRCs similarly to genksyms.

So this is too word centric Rust, let's think about this generically.
We still ahve a symbol limitation even in the C world then, and this
solution can solve that problem also for other reasons for *whatever*
reason we devise to-come-up-with-in-the-future for augmenting symbols.
Today Rust, tomorrow, who knows.

> gendwarfksyms is written in C and uses libdw to process DWARF, mainly
> because of the existing support for C host tools that use elfutils
> (e.g., objtool).

I agree with Masahiro, that testing this with vmlinux would be eye
opening to what challenges we really have ahead. So, to help with this
let's try to re-think about this  from another perspective.

Yes, old userspace should not break, but you can add yet another option
to let you opt-in to a new world order of how these crc are mapped to
hashed repersentations of the symbols. This would allow us to:

a) Ensure correctness for all users / tools, so that proper plumbing is
   really done. By considering all symbols you increase your scope of
   awareness of anything that could really break.

b) Remove the "Rust" nature about this

c) Rust modules just becomes a *user* of this approach

It gets me wondering, given Kris is also working on allowing traces to
map symbols to module names, how does this fit into that world [0]?

As for a) the reason I'm thinking about having the ability to test a
full real kernel and moules with this is, without that, how are you sure
you have the full scope of the changes needed?

[0] https://lkml.kernel.org/r/20240614171428.968174-3-kris.van.h...@oracle.com

  Luis

Re: [PATCH v4 2/3] kbuild, kconfig: generate offset range data for builtin modules

2024-06-18 Thread Luis Chamberlain

On Fri, Jun 14, 2024 at 01:14:27PM -0400, Kris Van Hees wrote:
> The offset range data for builtin modules is generated using:
>  - modules.builtin: associates object files with module names
>  - vmlinux.map: provides load order of sections and offset of first member
> per section
>  - vmlinux.o.map: provides offset of object file content per section
>  - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME

What tests do we have to ensure this is working correctly and not
spewing out lies? What proactive mechanisms do we have to verify the
semantics won't change, or to warn at build time that this awk script
will break upon new changes? Is this just best effort? Is that good
enough? Why?

  Luis

Re: [PATCH] module: Add log information for loading module failures

2024-06-18 Thread Luis Chamberlain

On Fri, Jun 14, 2024 at 09:25:19AM +, Yusong Gao wrote:
> Add log information in kernel-space when loading module failures.
> Try to load the unsigned module and the module with bad signature
> when set 1 to /sys/module/module/parameters/sig_enforce.
> 
> Unsigned module case:
> (linux) insmod unsigned.ko
> [   18.714661] Loading of unsigned module is rejected
> insmod: can't insert 'unsigned.ko': Key was rejected by service
> (linux)
> 
> Bad signature module case:
> (linux) insmod bad_signature.ko
> insmod: can't insert 'bad_signature.ko': Key was rejected by service
> (linux)
> 
> There have different logging behavior the bad signature case only log
> in user-space, add log info for fatal errors in module_sig_check().
> 
> Signed-off-by: Yusong Gao 
> ---
>  kernel/module/signing.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/module/signing.c b/kernel/module/signing.c
> index a2ff4242e623..6a6493c8f7e4 100644
> --- a/kernel/module/signing.c
> +++ b/kernel/module/signing.c
> @@ -113,6 +113,7 @@ int module_sig_check(struct load_info *info, int flags)
>* unparseable signatures, and signature check failures --
>* even if signatures aren't required.
>*/
> + pr_notice("Loading module failed (errno=%d)\n", -err);
>   return err;

I welcome pr_debug() messages but if we were to add a regular print for every
single type of failure we'd clutter the code, we don't want that.

  Luis

[GIT PULL] Modules changes for v6.10-rc1

2024-05-14 Thread Luis Chamberlain

The following changes since commit a5131c3fdf2608f1c15f3809e201cf540eb28489:

  Merge tag 'x86-shstk-2024-05-13' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2024-05-13 19:33:23 
-0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ 
tags/modules-6.10-rc1

for you to fetch changes up to 2c9e5d4a008293407836d29d35dfd4353615bd2f:

  bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of (2024-05-14 
00:36:29 -0700)


Modules changes for v6.10-rc1

Finally something fun. Mike Rapoport does some cleanup to allow us to
take out module_alloc() out of modules into a new paint shedded execmem_alloc()
and execmem_free() so to make emphasis these helpers are actually used outside
of modules. It starts with a no-functional changes API rename / placeholders
to then allow architectures to define their requirements into a new shiny
struct execmem_info with ranges, and requirements for those ranges. Archs
now can intitialize this execmem_info as the last part of mm_core_init() if
they have to diverge from the norm. Each range is a known type clearly
articulated and spelled out in enum execmem_type.

Although a lot of this is major cleanup and prep work for future enhancements an
immediate clear gain is we get to enable KPROBES without MODULES now. That is
ultimately what motiviated to pick this work up again, now with smaller goal as
concrete stepping stone.

This has been sitting on linux-next for a little less than a month, a few issues
were found already and fixed, in particular an odd mips boot issue. Arch folks
reviewed the code too. This is ready for wider exposure and testing.


Justin Stitt (1):
  kallsyms: replace deprecated strncpy with strscpy

Mike Rapoport (IBM) (16):
  arm64: module: remove unneeded call to kasan_alloc_module_shadow()
  mips: module: rename MODULE_START to MODULES_VADDR
  nios2: define virtual address space for modules
  sparc: simplify module_alloc()
  module: make module_memory_{alloc,free} more self-contained
  mm: introduce execmem_alloc() and execmem_free()
  mm/execmem, arch: convert simple overrides of module_alloc to execmem
  mm/execmem, arch: convert remaining overrides of module_alloc to execmem
  riscv: extend execmem_params for generated code allocations
  arm64: extend execmem_info for generated code allocations
  powerpc: extend execmem_params for kprobes allocations
  arch: make execmem setup available regardless of CONFIG_MODULES
  x86/ftrace: enable dynamic ftrace without CONFIG_MODULES
  powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriate
  kprobes: remove dependency on CONFIG_MODULES
  bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

Yifan Hong (1):
  module: allow UNUSED_KSYMS_WHITELIST to be relative against objtree.

 arch/Kconfig |  10 ++-
 arch/arm/kernel/module.c |  34 -
 arch/arm/mm/init.c   |  45 +++
 arch/arm64/Kconfig   |   1 +
 arch/arm64/kernel/module.c   | 126 --
 arch/arm64/kernel/probes/kprobes.c   |   7 --
 arch/arm64/mm/init.c | 140 ++
 arch/arm64/net/bpf_jit_comp.c|  11 ---
 arch/loongarch/kernel/module.c   |   6 --
 arch/loongarch/mm/init.c |  21 +
 arch/mips/include/asm/pgtable-64.h   |   4 +-
 arch/mips/kernel/module.c|  10 ---
 arch/mips/mm/fault.c |   4 +-
 arch/mips/mm/init.c  |  23 ++
 arch/nios2/include/asm/pgtable.h |   5 +-
 arch/nios2/kernel/module.c   |  20 -
 arch/nios2/mm/init.c |  21 +
 arch/parisc/kernel/module.c  |  12 ---
 arch/parisc/mm/init.c|  23 +-
 arch/powerpc/Kconfig |   2 +-
 arch/powerpc/include/asm/kasan.h |   2 +-
 arch/powerpc/kernel/head_8xx.S   |   4 +-
 arch/powerpc/kernel/head_book3s_32.S |   6 +-
 arch/powerpc/kernel/kprobes.c|  22 +-
 arch/powerpc/kernel/module.c |  38 --
 arch/powerpc/lib/code-patching.c |   2 +-
 arch/powerpc/mm/book3s32/mmu.c   |   2 +-
 arch/powerpc/mm/mem.c|  64 
 arch/riscv/include/asm/pgtable.h |   3 +
 arch/riscv/kernel/module.c   |  12 ---
 arch/riscv/kernel/probes/kprobes.c   |  10 ---
 arch/riscv/mm/init.c |  35 +
 arch/riscv/net/bpf_jit_core.c|  13 
 arch/s390/kernel/ftrace.c|   4 +-
 arch/s390/kernel/kprobes.c   |   4 +-
 arch/s390/kernel/module.c|  42 +-
 arch/s390/mm/init.c  |  30 
 arch/sparc/include/asm/pgtable_32.h  |   2 +
 arch/sparc/kernel/module.c   |  30

Re: [PATCH] modules: Drop the .export_symbol section from the final modules

2024-05-11 Thread Luis Chamberlain

On Wed, Apr 17, 2024 at 01:35:30PM +0800, wang...@lemote.com wrote:
> From: Wang Yao 
> 
> Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")
> forget drop the .export_symbol section from the final modules.
> 
> Signed-off-by: Wang Yao 

Masahiro, commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by
modpost") was your change, wanna address / take it through your
tree? It makes sense to me though.

  Luis

> ---
>  scripts/module.lds.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/scripts/module.lds.S b/scripts/module.lds.S
> index bf5bcf2836d8..89ff01a22634 100644
> --- a/scripts/module.lds.S
> +++ b/scripts/module.lds.S
> @@ -13,6 +13,7 @@ SECTIONS {
>   /DISCARD/ : {
>   *(.discard)
>   *(.discard.*)
> + *(.export_symbol)
>   }
>  
>   __ksymtab   0 : { *(SORT(___ksymtab+*)) }
> -- 
> 2.27.0
>

Re: [PATCH RESEND v8 00/16] mm: jit/text allocator

2024-05-05 Thread Luis Chamberlain

On Sun, May 05, 2024 at 07:06:12PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v8
> 
> v8:
> * fix intialization of default_execmem_info

Thanks, applied and pushed to modules-next. If we find fixes, let's
please just now have separate patches on top of this series.

  Luis

Re: [PATCH v7 00/16] mm: jit/text allocator

2024-05-02 Thread Luis Chamberlain

On Thu, May 02, 2024 at 11:50:36PM +0100, Liviu Dudau wrote:
> On Mon, Apr 29, 2024 at 09:29:20AM -0700, Luis Chamberlain wrote:
> > On Mon, Apr 29, 2024 at 03:16:04PM +0300, Mike Rapoport wrote:
> > > From: "Mike Rapoport (IBM)" 
> > > 
> > > Hi,
> > > 
> > > The patches are also available in git:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v7
> > > 
> > > v7 changes:
> > > * define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
> > >   #ifdefs in a function body
> > > * add Acks, thanks everybody
> > 
> > Thanks, I've pushed this to modules-next for further exposure / testing.
> > Given the status of testing so far with prior revisions, in that only a
> > few issues were found and that those were fixed, and the status of
> > reviews, this just might be ripe for v6.10.
> 
> Looks like there is still some work needed. I've picked up next-20240501
> and on arch/mips with CONFIG_MODULE_COMPRESS_XZ=y and 
> CONFIG_MODULE_DECOMPRESS=y
> I fail to load any module:
> 
> # modprobe rfkill
> [11746.539090] Invalid ELF header magic: != ELF
> [11746.587149] execmem: unable to allocate memory
> modprobe: can't load module rfkill (kernel/net/rfkill/rfkill.ko.xz): Out of 
> memory
> 
> The (hopefully) relevant parts of my .config:

Thanks for the report! Any chance we can get you to try a bisection? I
think it should take 2-3 test boots. To help reduce scope you try modules-next:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next

Then can you check by resetting your tree to commmit 3fbe6c2f820a76 (mm:
introduce execmem_alloc() and execmem_free()"). I suspect that should
boot, so your bad commit would be the tip 3c2c250cb3a5fbb ("bpf: remove
CONFIG_BPF_JIT dependency on CONFIG_MODULES of").

That gives us only a few commits to bisect:

git log --oneline 3fbe6c2f820a76bc36d5546bda85832f57c8fce2..
3c2c250cb3a5 (HEAD -> modules-next, korg/modules-next) bpf: remove 
CONFIG_BPF_JIT dependency on CONFIG_MODULES of
11e8e65cce5c kprobes: remove dependency on CONFIG_MODULES
e10cbc38697b powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where 
appropriate
4da3d38f24c5 x86/ftrace: enable dynamic ftrace without CONFIG_MODULES
13ae3d74ee70 arch: make execmem setup available regardless of CONFIG_MODULES
460bbbc70a47 powerpc: extend execmem_params for kprobes allocations
e1a14069b5b4 arm64: extend execmem_info for generated code allocations
971e181c6585 riscv: extend execmem_params for generated code allocations
0fa276f26721 mm/execmem, arch: convert remaining overrides of module_alloc to 
execmem
022cef244287 mm/execmem, arch: convert simple overrides of module_alloc to 
execmem

With 2-3 boots we should be to tell which is the bad commit.

  Luis

Re: [PATCH v7 00/16] mm: jit/text allocator

2024-04-29 Thread Luis Chamberlain

On Mon, Apr 29, 2024 at 03:16:04PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v7
> 
> v7 changes:
> * define MODULE_{VADDR,END} for riscv32 to fix the build and avoid
>   #ifdefs in a function body
> * add Acks, thanks everybody

Thanks, I've pushed this to modules-next for further exposure / testing.
Given the status of testing so far with prior revisions, in that only a
few issues were found and that those were fixed, and the status of
reviews, this just might be ripe for v6.10.

  Luis

Re: [PATCH v6 00/16] mm: jit/text allocator

2024-04-26 Thread Luis Chamberlain

On Fri, Apr 26, 2024 at 11:28:38AM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> The patches are also available in git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v6
> 
> v6 changes:
> * restore patch "arm64: extend execmem_info for generated code
>   allocations" that disappeared in v5 rebase
> * update execmem initialization so that by default it will be
>   initialized early while late initialization will be an opt-in

I've taken this new iteration again through modules-next so to help get
more testing exposure to this. If we run into conflicts again with mm
we can see if Andrew is willing to take this in through there. However,
it may make sense to only consider up to "mm: introduce execmem_alloc() and
execmem_free()" for v6.10 given we're bound to likely find more issues
and we are already at rc5.

  Luis

Re: [PATCH v5 00/15] mm: jit/text allocator

2024-04-23 Thread Luis Chamberlain

On Mon, Apr 22, 2024 at 12:44:21PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> (something went wrong with the prevois posting, sorry for the noise)
> 
> Hi,
> 
> Since v3 I looked into making execmem more of an utility toolbox, as we
> discussed at LPC with Mark Rutland, but it was getting more hairier than
> having a struct describing architecture constraints and a type identifying
> the consumer of execmem.
> 
> And I do think that having the description of architecture constraints for
> allocations of executable memory in a single place is better than having it
> spread all over the place.
> 
> The patches available via git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v5

Thanks! I've merged and pushed this onto modules-next in its entirety now for
wider testing.

  Luis

Re: [PATCH] module: ban '.', '..' as module names, ban '/' in module names

2024-04-14 Thread Luis Chamberlain

On Sun, Apr 14, 2024 at 10:05:05PM +0300, Alexey Dobriyan wrote:
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3616,4 +3616,12 @@ extern int vfs_fadvise(struct file *file, loff_t 
> offset, loff_t len,
>  extern int generic_fadvise(struct file *file, loff_t offset, loff_t len,
>  int advice);
>  
> +/*
> + * Use this if data from userspace end up as directory/filename on
> + * some virtual filesystem.
> + */
> +static inline bool string_is_vfs_ready(const char *s)
> +{
> + return strcmp(s, ".") != 0 && strcmp(s, "..") != 0 && !strchr(s, '/');
> +}
>  #endif /* _LINUX_FS_H */
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2893,6 +2893,11 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>  
>   audit_log_kern_module(mod->name);
>  
> + if (!string_is_vfs_ready(mod->name)) {
> + err = -EINVAL;
> + goto free_module;
> + }
> +

Sensible change however to put string_is_vfs_ready() in include/linux/fs.h 
is a stretch if there really are no other users.

  Luis

Re: [PATCH v4 00/15] mm: jit/text allocator

2024-04-11 Thread Luis Chamberlain

On Thu, Apr 11, 2024 at 07:00:36PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> Hi,
> 
> Since v3 I looked into making execmem more of an utility toolbox, as we
> discussed at LPC with Mark Rutland, but it was getting more hairier than
> having a struct describing architecture constraints and a type identifying
> the consumer of execmem.
> 
> And I do think that having the description of architecture constraints for
> allocations of executable memory in a single place is better that having it
> spread all over the place.
> 
> The patches available via git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v4

I've taken the first 5 patches through modules-next for now to get early
exposure to testing. Of those I just had minor nit feedback on the 5th,
but the rest look good.

Let's wait for review for the rest of the patches 6-15.

  Luis

Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-11 Thread Luis Chamberlain

On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" 
> 
> module_alloc() is used everywhere as a mean to allocate memory for code.
> 
> Beside being semantically wrong, this unnecessarily ties all subsystems
> that need to allocate code, such as ftrace, kprobes and BPF to modules and
> puts the burden of code allocation to the modules code.
> 
> Several architectures override module_alloc() because of various
> constraints where the executable memory can be located and this causes
> additional obstacles for improvements of code allocation.
> 
> Start splitting code allocation from modules by introducing execmem_alloc()
> and execmem_free() APIs.
> 
> Initially, execmem_alloc() is a wrapper for module_alloc() and
> execmem_free() is a replacement of module_memfree() to allow updating all
> call sites to use the new APIs.
> 
> Since architectures define different restrictions on placement,
> permissions, alignment and other parameters for memory that can be used by
> different subsystems that allocate executable memory, execmem_alloc() takes
> a type argument, that will be used to identify the calling subsystem and to
> allow architectures define parameters for ranges suitable for that
> subsystem.

It would be good to describe this is a non-fuctional change.

> Signed-off-by: Mike Rapoport (IBM) 
> ---

> diff --git a/mm/execmem.c b/mm/execmem.c
> new file mode 100644
> index ..ed2ea41a2543
> --- /dev/null
> +++ b/mm/execmem.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0

And this just needs to copy over the copyright notices from the main.c file.

  Luis

Re: [PATCH] [v4] module: don't ignore sysfs_create_link() failures

2024-04-08 Thread Luis Chamberlain

On Mon, Apr 08, 2024 at 10:05:58AM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The sysfs_create_link() return code is marked as __must_check, but the
> module_add_driver() function tries hard to not care, by assigning the
> return code to a variable. When building with 'make W=1', gcc still
> warns because this variable is only assigned but not used:
> 
> drivers/base/module.c: In function 'module_add_driver':
> drivers/base/module.c:36:6: warning: variable 'no_warn' set but not used 
> [-Wunused-but-set-variable]
> 
> Rework the code to properly unwind and return the error code to the
> caller. My reading of the original code was that it tries to
> not fail when the links already exist, so keep ignoring -EEXIST
> errors.
> 
> Fixes: e17e0f51aeea ("Driver core: show drivers in /sys/module/")
> Reviewed-by: Greg Kroah-Hartman 

Reviewed-by: Luis Chamberlain 

  Luis

[GIT PULL] Modules changes for v6.9-rc1

2024-03-12 Thread Luis Chamberlain

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

  Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ 
tags/modules-6.9-rc1

for you to fetch changes up to d1909c0221739356f31c721de4743e7d219a56cc:

  module: Don't ignore errors from set_memory_XX() (2024-02-16 11:30:43 -0800)


Modules changes for v6.9-rc1

Christophe Leroy did most of the work on this release, first with a few
cleanups on CONFIG_STRICT_KERNEL_RWX and ending with error handling for
when set_memory_XX() can fail. This is part of a larger effort to clean
up all these callers which can fail, modules is just part of it.

This has been sitting on linux-next for about a month without issues.


Christophe Leroy (6):
  module: Use set_memory_rox()
  module: Change module_enable_{nx/x/ro}() to more explicit names
  init: Declare rodata_enabled and mark_rodata_ro() at all time
  modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled
  powerpc: Simplify strict_kernel_rwx_enabled()
  module: Don't ignore errors from set_memory_XX()

Randy Dunlap (1):
  lib/test_kmod: fix kernel-doc warnings

 arch/powerpc/include/asm/mmu.h |  9 +-
 include/linux/init.h   |  4 ---
 init/main.c| 21 +-
 kernel/module/internal.h   |  6 ++--
 kernel/module/main.c   | 20 +++---
 kernel/module/strict_rwx.c | 63 +++---
 lib/test_kmod.c|  6 +++-
 7 files changed, 73 insertions(+), 56 deletions(-)

Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-03-12 Thread Luis Chamberlain

On Tue, Mar 12, 2024 at 03:25:27PM -0700, Luis Chamberlain wrote:
> On Tue, Feb 13, 2024 at 02:10:45PM +0300, Andrew Kanner wrote:
> > On Thu, Feb 01, 2024 at 10:13:54AM -0800, Luis Chamberlain wrote:
> > > 
> > > While you're at it, if you want to try it, you could see if you can
> > > improve the situation more by looking at symbol_get() users that remain
> > > and seeing if you can instead fix it with proper Kconfig dependency and
> > > at build time. Then we can just remove it as well.
> > > 
> > >   Luis
> > 
> > Sorry for the late reply.
> > 
> > Luis, can you give more details of your idea? I re-read it once, then
> > came back and still don't understand.
> > 
> > I see that there are ~10 users for symbol_get() currently. Do you want
> > to stringify symbol names at build time to completely remove
> > symbol_get() from module.h? Correct me if I'm wrong since using of a
> > fuction which is not declared anywhere sounds confusing.
> 
> As an example look at the code and see if there's a sensible way to make
> some calls built-in instead of part of the module, then the module can
> have a kconfig builtin option, that adds to the built-in code which
> means you don't need the symbol_get().
> 
> For some other pieces of code it may require other strategies.

An example is FW_LOADER_USER_HELPER which is bool only, and is selected
by users. It didn't use symbol_get() before, however its an example of
how through Kconfig you can align requirements and define built-in
components, even if they do come from a module.

  Luis

Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-03-12 Thread Luis Chamberlain

On Tue, Feb 13, 2024 at 02:10:45PM +0300, Andrew Kanner wrote:
> On Thu, Feb 01, 2024 at 10:13:54AM -0800, Luis Chamberlain wrote:
> > 
> > While you're at it, if you want to try it, you could see if you can
> > improve the situation more by looking at symbol_get() users that remain
> > and seeing if you can instead fix it with proper Kconfig dependency and
> > at build time. Then we can just remove it as well.
> > 
> >   Luis
> 
> Sorry for the late reply.
> 
> Luis, can you give more details of your idea? I re-read it once, then
> came back and still don't understand.
> 
> I see that there are ~10 users for symbol_get() currently. Do you want
> to stringify symbol names at build time to completely remove
> symbol_get() from module.h? Correct me if I'm wrong since using of a
> fuction which is not declared anywhere sounds confusing.

As an example look at the code and see if there's a sensible way to make
some calls built-in instead of part of the module, then the module can
have a kconfig builtin option, that adds to the built-in code which
means you don't need the symbol_get().

For some other pieces of code it may require other strategies.

  Luis

Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

2024-03-07 Thread Luis Chamberlain

On Thu, Mar 7, 2024 at 6:50 PM Masami Hiramatsu  wrote:
>
> On Wed, 6 Mar 2024 17:58:14 -0800
> Song Liu  wrote:
>
> > Hi Calvin,
> >
> > It is great to hear from you! :)
> >
> > On Wed, Mar 6, 2024 at 3:23 PM Calvin Owens  wrote:
> > >
> > > On Wednesday 03/06 at 13:34 -0800, Luis Chamberlain wrote:
> > > > On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> > > > > Hello all,
> > > > >
> > > > > This patchset makes it possible to use bpftrace with kprobes on 
> > > > > kernels
> > > > > built without loadable module support.
> > > >
> > > > This is a step in the right direction for another reason: clearly the
> > > > module_alloc() is not about modules, and we have special reasons for it
> > > > now beyond modules. The effort to share a generalize a huge page for
> > > > these things is also another reason for some of this but that is more
> > > > long term.
> > > >
> > > > I'm all for minor changes here so to avoid regressions but it seems a
> > > > rename is in order -- if we're going to all this might as well do it
> > > > now. And for that I'd just like to ask you paint the bikeshed with
> > > > Song Liu as he's been the one slowly making way to help us get there
> > > > with the "module: replace module_layout with module_memory",
> > > > and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
> > > > the EXECMEM stuff would be what we use instead then. Mike kept the
> > > > module_alloc() and the execmem was just a wrapper but your move of the
> > > > arch stuff makes sense as well and I think would complement his series
> > > > nicely.
> > >
> > > I apologize for missing that. I think these are the four most recent
> > > versions of the different series referenced from that LWN link:
> > >
> > >   a) https://lore.kernel.org/all/20230918072955.2507221-1-r...@kernel.org/
> > >   b) https://lore.kernel.org/all/20230526051529.3387103-1-s...@kernel.org/
> > >   c) https://lore.kernel.org/all/20221107223921.3451913-1-s...@kernel.org/
> > >   d) 
> > > https://lore.kernel.org/all/20201120202426.18009-1-rick.p.edgeco...@intel.com/
> > >
> > > Song and Mike, please correct me if I'm wrong, but I think what I've
> > > done here (see [1], sorry for not adding you initially) is compatible
> > > with everything both of you have recently proposed above. How do you
> > > feel about this as a first step?
> >
> > I agree that the work here is compatible with other efforts. I have no
> > objection to making this the first step.
> >
> > >
> > > For naming, execmem_alloc() seems reasonable to me? I have no strong
> > > feelings at all, I'll just use that going forward unless somebody else
> > > expresses an opinion.
> >
> > I am not good at naming things. No objection from me to "execmem_alloc".
>
> Hm, it sounds good to me too. I think we should add a patch which just
> rename the module_alloc/module_memfree with execmem_alloc/free first.

I think that would be cleaner, yes. Leaving the possible move to a
secondary patch and placing the testing more on the later part.

 Luis

Re: [RFC][PATCH 0/4] Make bpf_jit and kprobes work with CONFIG_MODULES=n

2024-03-06 Thread Luis Chamberlain

On Wed, Mar 06, 2024 at 12:05:07PM -0800, Calvin Owens wrote:
> Hello all,
> 
> This patchset makes it possible to use bpftrace with kprobes on kernels
> built without loadable module support.

This is a step in the right direction for another reason: clearly the
module_alloc() is not about modules, and we have special reasons for it
now beyond modules. The effort to share a generalize a huge page for
these things is also another reason for some of this but that is more
long term.

I'm all for minor changes here so to avoid regressions but it seems a
rename is in order -- if we're going to all this might as well do it
now. And for that I'd just like to ask you paint the bikeshed with
Song Liu as he's been the one slowly making way to help us get there
with the "module: replace module_layout with module_memory",
and Mike Rapoport as he's had some follow up attempts [0]. As I see it,
the EXECMEM stuff would be what we use instead then. Mike kept the
module_alloc() and the execmem was just a wrapper but your move of the
arch stuff makes sense as well and I think would complement his series
nicely.

If you're gonna split code up to move to another place, it'd be nice
if you can add copyright headers as was done with the kernel/module.c
split into kernel/module/*.c

Can we start with some small basic stuff we can all agree on?

[0] https://lwn.net/Articles/944857/

  Luis

Re: [PATCH v4] modules: wait do_free_init correctly

2024-02-27 Thread Luis Chamberlain

On Tue, Feb 27, 2024 at 10:35:46AM +0800, Changbin Du wrote:
> The synchronization here is to ensure the ordering of freeing of a module
> init so that it happens before W+X checking. It is worth noting it is not
> that the freeing was not happening, it is just that our sanity checkers
> raced against the permission checkers which assume init memory is already
> gone.
> 
> Commit 1a7b7d922081 ("modules: Use vmalloc special flag") moved
> calling do_free_init() into a global workqueue instead of relying on it
> being called through call_rcu(..., do_free_init), which used to allowed us
> call do_free_init() asynchronously after the end of a subsequent grace
> period. The move to a global workqueue broke the gaurantees for code which
> needed to be sure the do_free_init() would complete with rcu_barrier().
> To fix this callers which used to rely on rcu_barrier() must now instead
> use flush_work(_free_wq).
> 
> Without this fix, we still could encounter false positive reports in W+X
> checking since the rcu_barrier() here can not ensure the ordering now.
> 
> Even worse, the rcu_barrier() can introduce significant delay. Eric Chanudet
> reported that the rcu_barrier introduces ~0.1s delay on a PREEMPT_RT kernel.
> 
>   [0.291444] Freeing unused kernel memory: 5568K
>   [0.402442] Run /sbin/init as init process
> 
> With this fix, the above delay can be eliminated.
> 
> Fixes: 1a7b7d922081 ("modules: Use vmalloc special flag")
> Signed-off-by: Changbin Du 
> Cc: Xiaoyi Su 
> Cc: Eric Chanudet 
> Cc: Luis Chamberlain 
> Tested-by: Eric Chanudet 

Acked-by: Luis Chamberlain 

  Luis

Re: [PATCH v3] modules: wait do_free_init correctly

2024-02-21 Thread Luis Chamberlain

+ live-patching folks,

Finally, things are starting to be much clearer. Thanks for the time
for working on this, some more comments below and a question which
I think deserves some attention.

On Sat, Feb 17, 2024 at 04:18:10PM +0800, Changbin Du wrote:
> The synchronization here is just to ensure the module init's been freed
> before doing W+X checking. 

Some nits, this should read instead:

Fix the ordering of freeing of a module init so that it happens before
W+X checking.

> But the commit 1a7b7d922081 ("modules: Use
> vmalloc special flag") moves do_free_init() into a global workqueue
> instead of call_rcu(). So now rcu_barrier() can not ensure that do_free_init
> has completed. We should wait it via flush_work().

Remove "But" and adjust as:

Commit 1a7b7d922081 ("modules: Use vmalloc special flag") moved
calling do_free_init() into a global workqueue instead of relying on it
being called through call_rcu(..., do_free_init), which used to allowed us
call do_free_init() asynchronously after the end of a subsequent grace  

period. The move to a global workqueue broke the gaurantees for code
which needed to be sure the do_free_init() would complete with rcu_barrier().
To fix this callers which used to rely on rcu_barrier() must now instead
use flush_work(_free_wq).

> Without this fix, we still could encounter false positive reports in
> W+X checking,

This is good thanks for the clarification.

I think it would be useful for the commit log then to describe also that
it is not that the freeing was not happening, it is just that our sanity
checkers raced against the permission checkers which assume init memory
is already gone.

> and the rcu synchronization is unnecessary which can
> introduce significant delay.

While this can be true, I am not sure if we can remove it. See below.

> Eric Chanudet reports that the rcu_barrier introduces ~0.1s delay on a
> PREEMPT_RT kernel.

That's a separate issue.

>   [0.291444] Freeing unused kernel memory: 5568K
>   [0.402442] Run /sbin/init as init process
> 
> With this fix, the above delay can be eliminated.
> 
> Fixes: 1a7b7d922081 ("modules: Use vmalloc special flag")
> Signed-off-by: Changbin Du 
> Cc: Xiaoyi Su 
> Cc: Eric Chanudet 
> 
> ---
> v3:
>   - amend comment in do_init_module() and update commit msg.
> v2:
>   - fix compilation issue for no CONFIG_MODULES found by 0-DAY.
> ---
>  include/linux/moduleloader.h | 8 
>  init/main.c  | 5 +++--
>  kernel/module/main.c | 9 +++--
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> index 001b2ce83832..89b1e0ed9811 100644
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -115,6 +115,14 @@ int module_finalize(const Elf_Ehdr *hdr,
>   const Elf_Shdr *sechdrs,
>   struct module *mod);
>  
> +#ifdef CONFIG_MODULES
> +void flush_module_init_free_work(void);
> +#else
> +static inline void flush_module_init_free_work(void)
> +{
> +}
> +#endif
> +
>  /* Any cleanup needed when module leaves. */
>  void module_arch_cleanup(struct module *mod);
>  
> diff --git a/init/main.c b/init/main.c
> index e24b0780fdff..f0b7e21ac67f 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -99,6 +99,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -1402,11 +1403,11 @@ static void mark_readonly(void)
>   if (rodata_enabled) {
>   /*
>* load_module() results in W+X mappings, which are cleaned
> -  * up with call_rcu().  Let's make sure that queued work is
> +  * up with init_free_wq. Let's make sure that queued work is
>* flushed so that we don't hit false positives looking for
>* insecure pages which are W+X.
>*/
> - rcu_barrier();

Was this the only source of waiters that used rcu_barrier() to sync ?
What about kallsyms, live-patching ?

This original source to the addition of this rcu_barrier() (in a slight
older modified form with with rcu_barrier_sched()) was commit
ae646f0b9ca13 ("init: fix false positives in W+X checking") since
v4.17 in 2018, 6 years ago. So I'm hoping we don't have any other
side-by new users which have grown dependent on this rcu_barrier() for
other call_rcu()'s they may have used, but it is hard to tell.

So while I agree that flush work is the right solution, removing the
rcu_barrier() is technically another change which could potentially
regress for other reasons now. It is perhaps safe, but I'm used to
surprises for minor changes like these. So I think it makes sense to
lift it now, and test it in the wild to see what could possibly break,
I'd much prefer to split this as two separate commits. One which does
the fix, and another that lifts the rcu_barrier() with the stated
rationale and savings on time of ~0.1s on PREEMPT_RT kernels.

  Luis

Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-02-02 Thread Luis Chamberlain

On Thu, Feb 01, 2024 at 03:27:54PM +0100, Marco Pagani wrote:
> 
> On 2024-01-30 21:47, Luis Chamberlain wrote:
> > 
> > It very much sounds like there is a desire to have this but without a
> > user, there is no justification.
> 
> I was working on a set of patches to fix an issue in the fpga subsystem
> when I came across your commit 557aafac1153 ("kernel/module: add
> documentation for try_module_get()") that made me realize we also had a
> safety problem. 
> 
> To solve this problem for the fpga manager, we had to add a mutex to
> ensure the low-level module still exists before calling
> try_module_get(). However, having a safer version of try_module_get()
> would have simplified the code and made it more robust against changes.
> 
> https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/
> 
> I suspect there may be other cases where try_module_get() is
> inadvertently called without ensuring that the module still exists
> that may benefit from a safer implementation.

Maybe so, however I'm not yet sure if this is safe from deadlocks.
Please work on a series of selftest simple modules which demonstrate
its use / and a simple bash script selftest loader which verifies this
won't bust. Consider you may have third party modules which also race
with this too, and other users without this new API.

> >> +bool try_module_get_safe(struct module *module)
> >> +{
> >> +  struct module *mod;
> >> +  bool ret = true;
> >> +
> >> +  if (!module)
> >> +  goto out;
> >> +
> >> +  mutex_lock(_mutex);
> > 
> > If a user comes around then this should be mutex_lock_interruptible(),
> > and add might_sleep()
> 
> Would it be okay to return false if it gets interrupted, or should I
> change the return type to int to propagate -EINTR? My concern with
> changing the signature is that it would be less straightforward to
> use the function in place of try_module_get().

Since we want a safe mechanism we might as well not allow a simple drop
in replacement but a more robust one so that users take care of the
return value properly.

  Luis

Re: [PATCH v2 0/3] modules: few of alignment fixes

2024-02-02 Thread Luis Chamberlain

On Sat, Feb 03, 2024 at 12:20:38AM +0900, Masahiro Yamada wrote:
> On Fri, Feb 2, 2024 at 3:05 AM Luis Chamberlain  wrote:
> >
> > On Wed, Jan 31, 2024 at 02:11:44PM -0800, Luis Chamberlain wrote:
> > > On Mon, Jan 29, 2024 at 11:26:39AM -0800, Luis Chamberlain wrote:
> > > > Masahiro, if there no issues feel free to take this or I can take them 
> > > > in
> > > > too via the modules-next tree. Lemme know!
> > >
> > > I've queued this onto modules-testing to get winder testing [0]
> >
> > I've moved it to modules-next as I've found no issues.
> >
> >   Luis
> 
> 
> I believe this patch series is wrong.
> 
> I thought we agreed that the alignment must be added to
> individual asm code, not to the linker script.
>
> I am surprised that you came back to this.

I misseed the dialog on the old cover letter, sorry. I've yanked these patches
out. I'd expect a respin from Helge.

  Luis

Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-02-01 Thread Luis Chamberlain

On Thu, Feb 01, 2024 at 12:29:46PM +0300, Andrew Kanner wrote:
> On Thu, Feb 01, 2024 at 06:29:58AM +0100, Christoph Hellwig wrote:
> > On Wed, Jan 31, 2024 at 10:02:52PM +0300, Andrew Kanner wrote:
> > > Prototype for __symbol_get_gpl() was introduced in the initial git
> > > commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), but was not used after that.
> > > 
> > > In commit 9011e49d54dc ("modules: only allow symbol_get of
> > > EXPORT_SYMBOL_GPL modules") Christoph Hellwig switched __symbol_get()
> > > to process GPL symbols only, most likely this is what
> > > __symbol_get_gpl() was designed to do.
> > > 
> > > We might either define __symbol_get_gpl() as __symbol_get() or remove
> > > it completely as suggested by Mauro Carvalho Chehab.
> > 
> > Just remove it, there is no need to keep unused funtionality around.
> > 
> > Btw, where did the discussion start?  I hope you're not trying to
> > add new symbol_get users?
> > 
> 
> Of course not, no new users needed.
> 
> I haven't discussed it directly. I found the unused __symbol_get_gpl()
> myself, but during investigation of wether it was ever used somewhere
> found the old patch series suggested by Mauro Carvalho Chehab (in Cc).
> 
> Link: 
> https://lore.kernel.org/lkml/5f001015990a76c0da35a4c3cf08e457ec353ab2.1652113087.git.mche...@kernel.org/
> 
> The patch series is from 2022 and not merged. You can take [PATCH v6
> 1/4] which removes the unused symbol from the link.
> 
> Or I can resend v2 with my commit msg. But not sure about how it works
> in such a case - will adding Suggested-by tag (if no objections from
> Mauro) with the Link be ok?

While you're at it, if you want to try it, you could see if you can
improve the situation more by looking at symbol_get() users that remain
and seeing if you can instead fix it with proper Kconfig dependency and
at build time. Then we can just remove it as well.

  Luis

Re: [PATCH] lib/test_kmod: fix kernel-doc warnings

2024-02-01 Thread Luis Chamberlain

On Fri, Nov 03, 2023 at 09:20:44PM -0700, Randy Dunlap wrote:
> Fix all kernel-doc warnings in test_kmod.c:
> - Mark some enum values as private so that kernel-doc is not needed
>   for them
> - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments
> - add kernel-doc info for @task_sync
> 
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in 
> enum 'kmod_test_case'
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum 
> 'kmod_test_case'
> test_kmod.c:100: warning: Function parameter or member 'task_sync' not 
> described in 'kmod_test_device_info'
> test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not 
> described in 'kmod_test_device'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Luis Chamberlain 
> Cc: linux-modu...@vger.kernel.org

Applied and pushed, thanks!

  Luis

Re: [PATCH v2 0/3] modules: few of alignment fixes

2024-02-01 Thread Luis Chamberlain

On Wed, Jan 31, 2024 at 02:11:44PM -0800, Luis Chamberlain wrote:
> On Mon, Jan 29, 2024 at 11:26:39AM -0800, Luis Chamberlain wrote:
> > Masahiro, if there no issues feel free to take this or I can take them in
> > too via the modules-next tree. Lemme know!
> 
> I've queued this onto modules-testing to get winder testing [0]

I've moved it to modules-next as I've found no issues.

  Luis

Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-31 Thread Luis Chamberlain

On Wed, Jan 31, 2024 at 06:53:13AM +, Christophe Leroy wrote:
> The problem being identified in commit 677bfb9db8a3 ("module: Don't 
> ignore errors from set_memory_XX()"), you can keep/re-apply the series 
> [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time.

Sure, queued that up into modules-testing before I push to modules-next.

  Luis

Re: [PATCH v2 0/3] modules: few of alignment fixes

2024-01-31 Thread Luis Chamberlain

On Mon, Jan 29, 2024 at 11:26:39AM -0800, Luis Chamberlain wrote:
> Masahiro, if there no issues feel free to take this or I can take them in
> too via the modules-next tree. Lemme know!

I've queued this onto modules-testing to get winder testing [0]

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-testing

  Luis

Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-01-30 Thread Luis Chamberlain

On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote:
> The current implementation of try_module_get() requires the module to
> exist and be live as a precondition. While this may seem intuitive at
> first glance, enforcing the precondition can be tricky, considering that
> modules can be unloaded at any time if not previously taken. For
> instance, the caller could be preempted just before calling
> try_module_get(), and while preempted, the module could be unloaded and
> freed. More subtly, the module could also be unloaded at any point while
> executing try_module_get() before incrementing the refount with
> atomic_inc_not_zero().
> 
> Neglecting the precondition that the module must exist and be live can
> cause unexpected race conditions that can lead to crashes. However,
> ensuring that the precondition is met may require additional locking
> that increases the complexity of the code and can make it more
> error-prone.
> 
> This patch adds a slower yet safer implementation of try_module_get()
> that checks if the module is valid by looking into the mod_tree before
> taking the module's refcount. This new function can be safely called on
> stale and invalid module pointers, relieving developers from the burden
> of ensuring that the module exists and is live before attempting to take
> it.
> 
> The tree lookup and refcount increment are executed after taking the
> module_mutex to prevent the module from being unloaded after looking up
> the tree.
> 
> Signed-off-by: Marco Pagani 

It very much sounds like there is a desire to have this but without a
user, there is no justification.

> +bool try_module_get_safe(struct module *module)
> +{
> + struct module *mod;
> + bool ret = true;
> +
> + if (!module)
> + goto out;
> +
> + mutex_lock(_mutex);

If a user comes around then this should be mutex_lock_interruptible(),
and add might_sleep()

> +
> + /*
> +  * Check if the address points to a valid live module and take
> +  * the refcount only if it points to the module struct.
> +  */
> + mod = __module_address((unsigned long)module);
> + if (mod && mod == module && module_is_live(mod))
> + __module_get(mod);
> + else
> + ret = false;
> +
> + mutex_unlock(_mutex);
> +
> +out:
> + return ret;
> +}
> +EXPORT_SYMBOL(try_module_get_safe);

And EXPORT_SYMBOL_GPL() would need to be used.

I'd also expect selftests to be expanded for this case, but again,
without a user, this is just trying to resolve a problem which does not
exist.

  Luis

Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-30 Thread Luis Chamberlain

On Tue, Jan 30, 2024 at 06:48:11PM +0100, Marek Szyprowski wrote:
> Dear All,
> 
> On 30.01.2024 12:03, Christophe Leroy wrote:
> > Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit :
> >> [Vous ne recevez pas souvent de courriers de we...@chromium.org. D?couvrez 
> >> pourquoi ceci est important ? 
> >> https://aka.ms/LearnAboutSenderIdentification ]
> >>
> >> On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote:
> >>> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote:
> >>>> Declaring rodata_enabled and mark_rodata_ro() at all time
> >>>> helps removing related #ifdefery in C files.
> >>>>
> >>>> Signed-off-by: Christophe Leroy 
> >>> Very nice cleanup, thanks!, applied and pushed
> >>>
> >>> Luis
> >> On next-20240130, which has your modules-next branch, and thus this
> >> series and the other "module: Use set_memory_rox()" series applied,
> >> my kernel crashes in some very weird way. Reverting your branch
> >> makes the crash go away.
> >>
> >> I thought I'd report it right away. Maybe you folks would know what's
> >> happening here? This is on arm64.
> > That's strange, it seems to bug in module_bug_finalize() which is
> > _before_ calls to module_enable_ro() and such.
> >
> > Can you try to revert the 6 patches one by one to see which one
> > introduces the problem ?
> >
> > In reality, only patch 677bfb9db8a3 really change things. Other ones are
> > more on less only cleanup.
> 
> I've also run into this issue with today's (20240130) linux-next on my 
> test farm. The issue is not fully reproducible, so it was a bit hard to 
> bisect it automatically. I've spent some time on manual testing and it 
> looks that reverting the following 2 commits on top of linux-next fixes 
> the problem:
> 
> 65929884f868 ("modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around 
> rodata_enabled")
> 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()")
> 
> This in fact means that commit 677bfb9db8a3 is responsible for this 
> regression, as 65929884f868 has to be reverted only because the latter 
> depends on it. Let me know what I can do to help debugging this issue.

Thanks for the bisect, I've reset my tree to commit
3559ad395bf02 ("module: Change module_enable_{nx/x/ro}() to more
explicit names") for now then, so to remove those commits.

  Luis

Re: [RESEND PATCH v2] modules: wait do_free_init correctly

2024-01-30 Thread Luis Chamberlain

On Tue, Jan 30, 2024 at 09:40:38AM +0800, Changbin Du wrote:
> On Mon, Jan 29, 2024 at 09:53:58AM -0800, Luis Chamberlain wrote:
> > On Mon, Jan 29, 2024 at 10:03:04AM +0800, Changbin Du wrote:
> > > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves
> > > do_free_init() into a global workqueue instead of call_rcu(). So now
> > > rcu_barrier() can not ensure that do_free_init has completed. We should
> > > wait it via flush_work().
> > > 
> > > Without this fix, we still could encounter false positive reports in
> > > W+X checking, and rcu synchronization is unnecessary.
> > 
> > You didn't answer my question, which should be documented in the commit log.
> > 
> > Does this mean we never freed modules init because of this? If so then
> > your commit log should clearly explain that. It should also explain that
> > if true (you have to verify) then it means we were no longer saving
> > the memory we wished to save, and that is important for distributions
> > which do want to save anything on memory. You may want to do a general
> > estimate on how much that means these days on any desktop / server.
>
> Actually, I have explained it in commit msg. It's not about saving memory. The
> synchronization here is just to ensure the module init's been freed before
> doing W+X checking. The problem is that the current implementation is wrong,
> rcu_barrier() cannot guarantee that. So we can encounter false positive 
> reports.
> But anyway, the module init will be freed, and it's just a timing related 
> issue.

Your desciption here is better than the commit log.

  Luis

Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-29 Thread Luis Chamberlain

On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote:
> Declaring rodata_enabled and mark_rodata_ro() at all time
> helps removing related #ifdefery in C files.
> 
> Signed-off-by: Christophe Leroy 

Very nice cleanup, thanks!, applied and pushed

  Luis

Re: [PATCH 1/3] module: Use set_memory_rox()

2024-01-29 Thread Luis Chamberlain

On Thu, Dec 21, 2023 at 08:24:23AM +0100, Christophe Leroy wrote:
> A couple of architectures seem concerned about calling set_memory_ro()
> and set_memory_x() too frequently and have implemented a version of
> set_memory_rox(), see commit 60463628c9e0 ("x86/mm: Implement native
> set_memory_rox()") and commit 22e99fa56443 ("s390/mm: implement
> set_memory_rox()")
> 
> Use set_memory_rox() in modules when STRICT_MODULES_RWX is set.
> 
> Signed-off-by: Christophe Leroy 

Nice simplification. I applied all 3 patches and pushed!

  Luis

[PATCH v2 1/4] selftests: add new kallsyms selftests

2024-01-29 Thread Luis Chamberlain

We lack find_symbol() selftests, so add one. This let's us stress test
improvements easily on find_symbol() or optimizations. It also inherently
allows us to test the limits of kallsyms on Linux today.

We test a pathalogical use case for kallsyms by introducing modules
which are automatically written for us with a larger number of symbols.
We have 4 kallsyms test modules:

A: has KALLSYSMS_NUMSYMS exported symbols
B: uses one of A's symbols
C: adds KALLSYMS_SCALE_FACTOR * KALLSYSMS_NUMSYMS exported
D: adds 2 * the symbols than C

By using anything much larger than KALLSYSMS_NUMSYMS as 10,000 and
KALLSYMS_SCALE_FACTOR of 8 we segfault today. So we're capped at
around 16 symbols somehow today. We can inpsect that issue at
our leasure later, but for now the real value to this test is that
this will easily allow us to test improvements on find_symbol().

On x86_64 we can use perf, for other architectures we just use 'time'
and allow for customizations. For example a future enhancements could
be done for parisc to check for unaligned accesses which triggers a
special special exception handler assembler code inside the kernel.
The negative impact on performance is so large on parisc that it
keeps track of its accesses on /proc/cpuinfo as UAH:

IRQ:   CPU0   CPU1
3:   1332  0 SuperIO  ttyS0
7:1270013  0 SuperIO  pata_ns87415
64:  320023012  320021431 CPU  timer
65:   17080507   20624423 CPU  IPI
UAH:   10948640  58104   Unaligned access handler traps

While at it, this tidies up lib/ test modules to allow us to have
a new directory for them. The amount of test modules under lib/
is insane.

This should also hopefully showcase how to start doing basic
self module writing code, which may be more useful for more complex
cases later in the future.

Signed-off-by: Luis Chamberlain 
---
 lib/Kconfig.debug | 103 ++
 lib/Makefile  |   1 +
 lib/tests/Makefile|   1 +
 lib/tests/module/.gitignore   |   4 +
 lib/tests/module/Makefile |  15 ++
 lib/tests/module/gen_test_kallsyms.sh | 128 ++
 tools/testing/selftests/module/Makefile   |  12 ++
 tools/testing/selftests/module/config |   3 +
 tools/testing/selftests/module/find_symbol.sh |  81 +++
 9 files changed, 348 insertions(+)
 create mode 100644 lib/tests/Makefile
 create mode 100644 lib/tests/module/.gitignore
 create mode 100644 lib/tests/module/Makefile
 create mode 100755 lib/tests/module/gen_test_kallsyms.sh
 create mode 100644 tools/testing/selftests/module/Makefile
 create mode 100644 tools/testing/selftests/module/config
 create mode 100755 tools/testing/selftests/module/find_symbol.sh

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 97ce28f4d154..29db47ca251f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2835,6 +2835,109 @@ config TEST_KMOD
 
  If unsure, say N.
 
+config TEST_RUNTIME
+   bool
+
+config TEST_RUNTIME_MODULE
+   bool
+
+config TEST_KALLSYMS
+   tristate "module kallsyms find_symbol() test"
+   depends on m
+   select TEST_RUNTIME
+   select TEST_RUNTIME_MODULE
+   select TEST_KALLSYMS_A
+   select TEST_KALLSYMS_B
+   select TEST_KALLSYMS_C
+   select TEST_KALLSYMS_D
+   help
+ This allows us to stress test find_symbol() through the kallsyms
+ used to place symbols on the kernel ELF kallsyms and modules kallsyms
+ where we place kernel symbols such as exported symbols.
+
+ We have four test modules:
+
+ A: has KALLSYSMS_NUMSYMS exported symbols
+ B: uses one of A's symbols
+ C: adds KALLSYMS_SCALE_FACTOR * KALLSYSMS_NUMSYMS exported
+ D: adds 2 * the symbols than C
+
+ We stress test find_symbol() through two means:
+
+ 1) Upon load of B it will trigger simplify_symbols() to look for the
+ one symbol it uses from the module A with tons of symbols. This is an
+ indirect way for us to have B call resolve_symbol_wait() upon module
+ load. This will eventually call find_symbol() which will eventually
+ try to find the symbols used with find_exported_symbol_in_section().
+ find_exported_symbol_in_section() uses bsearch() so a binary search
+ for each symbol. Binary search will at worst be O(log(n)) so the
+ larger TEST_MODULE_KALLSYSMS the worse the search.
+
+ 2) The selftests should load C first, before B. Upon B's load towards
+ the end right before we call module B's init routine we get
+ complete_formation() called on the module. That will first check
+ for duplicate symbols with the call to verify_exported_symbols().
+ That is when we'll force iteration on module C's insane symbol list.
+ Since it has 10 * KALLSYMS_NUMSYMS it me

[PATCH v2 4/4] modules: Add missing entry for __ex_table

2024-01-29 Thread Luis Chamberlain

From: Helge Deller 

The entry for __ex_table was missing, which may make __ex_table
become 1- or 2-byte aligned in modules.
Add the entry to ensure it gets 32-bit aligned.

Signed-off-by: Helge Deller 
Signed-off-by: Luis Chamberlain 
---
 scripts/module.lds.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index b00415a9ff27..488f61b156b2 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -26,6 +26,7 @@ SECTIONS {
.altinstructions0 : ALIGN(8) { KEEP(*(.altinstructions)) }
__bug_table 0 : ALIGN(8) { KEEP(*(__bug_table)) }
__jump_table0 : ALIGN(8) { KEEP(*(__jump_table)) }
+   __ex_table  0 : ALIGN(4) { KEEP(*(__ex_table)) }
 
__patchable_function_entries : { *(__patchable_function_entries) }
 
-- 
2.42.0

[PATCH v2 0/3] modules: few of alignment fixes

2024-01-29 Thread Luis Chamberlain

Helge Deller had written a series of patches to fix a few 
misalignemnt annotations in the kernel [0]. Three of these
patches were tagged as being stable candidates. Because of these
annotation I suggested proof of the imapact, however we did not
easily have a way to verify the value / impact of fixing a few
misaligment changes in the kernel for modules.

For the more hotpath alignment fix Helge had I suggested we could
easily test this by stress testing find_symbol() with a few set of
modules. This adds such tests to allow us to both allow us to test
the impact of such possible fix, and also while at it, let's us
test future improvements on the find_symbol() path.

Changes on this v2:

 - Adds new selftest for kallsyms
 - Drops patch #1 as Masahiro Yamada already applied it to linux-kbuild/fixes
 - Removes stable tags
 - Drops patch #3 as it was not needed
 - Adds a new patch with the issues noted by Helge as a fix
   to commit f3304ecd7f06 ("linux/export: use inline assembler to
   populate symbol CRCs") as noted by Masahiro Yamada
 - Adds selftest impact on x86_64 for patch #2, this really should
   be tested on parisc though as that's an example architecture
   where we could see perhaps more improvement

[0] https://lkml.kernel.org/r/2023111814.139916-1-del...@kernel.org

Masahiro, if there no issues feel free to take this or I can take them in
too via the modules-next tree. Lemme know!

Helge Deller (2):
  modules: Ensure 64-bit alignment on __ksymtab_* sections
  modules: Add missing entry for __ex_table

Luis Chamberlain (2):
  selftests: add new kallsyms selftests
  vmlinux.lds.h: add missing alignment for symbol CRCs

 include/linux/export-internal.h   |   1 +
 lib/Kconfig.debug | 103 ++
 lib/Makefile  |   1 +
 lib/tests/Makefile|   1 +
 lib/tests/module/.gitignore   |   4 +
 lib/tests/module/Makefile |  15 ++
 lib/tests/module/gen_test_kallsyms.sh | 128 ++
 scripts/module.lds.S  |   9 +-
 tools/testing/selftests/module/Makefile   |  12 ++
 tools/testing/selftests/module/config |   3 +
 tools/testing/selftests/module/find_symbol.sh |  81 +++
 11 files changed, 354 insertions(+), 4 deletions(-)
 create mode 100644 lib/tests/Makefile
 create mode 100644 lib/tests/module/.gitignore
 create mode 100644 lib/tests/module/Makefile
 create mode 100755 lib/tests/module/gen_test_kallsyms.sh
 create mode 100644 tools/testing/selftests/module/Makefile
 create mode 100644 tools/testing/selftests/module/config
 create mode 100755 tools/testing/selftests/module/find_symbol.sh

-- 
2.42.0

[PATCH v2 3/4] vmlinux.lds.h: add missing alignment for symbol CRCs

2024-01-29 Thread Luis Chamberlain

Commit f3304ecd7f06 (linux/export: use inline assembler to populate
symbol CRCs") fixed an issue with unexpected padding  by adding
asm inline assembly to directly specify the desired data layout.
It however forgot to add the alignment, fix that.

Reported-by: Helge Deller 
Fixes: f3304ecd7f06 ("linux/export: use inline assembler to populate symbol 
CRCs")
Signed-off-by: Luis Chamberlain 
---
 include/linux/export-internal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/export-internal.h b/include/linux/export-internal.h
index 69501e0ec239..51b8cf3f60ef 100644
--- a/include/linux/export-internal.h
+++ b/include/linux/export-internal.h
@@ -61,6 +61,7 @@
 
 #define SYMBOL_CRC(sym, crc, sec)   \
asm(".section \"___kcrctab" sec "+" #sym "\",\"a\"" "\n" \
+   ".balign 4" "\n" \
"__crc_" #sym ":"   "\n" \
".long " #crc   "\n" \
".previous" "\n")
-- 
2.42.0

[PATCH v2 2/4] modules: Ensure 64-bit alignment on __ksymtab_* sections

2024-01-29 Thread Luis Chamberlain

From: Helge Deller 

On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
(e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores
64-bit pointers into the __ksymtab* sections.
Make sure that those sections will be correctly aligned at module link time,
otherwise unaligned memory accesses may happen at runtime.

The __kcrctab* sections store 32-bit entities, so use ALIGN(4) for those.

Testing with the kallsyms selftest on x86_64 we see a savings of about
1,958,153 ns in the worst case which may or not be within noise. Testing
on parisc would be useful and welcomed.

On x86_64 before:

 Performance counter stats for '/sbin/modprobe test_kallsyms_b':

86,430,119 ns   duration_time
84,407,000 ns   system_time
   213  page-faults

   0.086430119 seconds time elapsed

   0.0 seconds user
   0.084407000 seconds sys

 Performance counter stats for '/sbin/modprobe test_kallsyms_b':

85,777,474 ns   duration_time
82,581,000 ns   system_time
   212  page-faults

   0.085777474 seconds time elapsed

   0.0 seconds user
   0.082581000 seconds sys

 Performance counter stats for '/sbin/modprobe test_kallsyms_b':

87,906,053 ns   duration_time
87,939,000 ns   system_time
   212  page-faults

   0.087906053 seconds time elapsed

   0.0 seconds user
   0.087939000 seconds sys

After:

 Performance counter stats for '/sbin/modprobe test_kallsyms_b':

82,925,631 ns   duration_time
83,000,000 ns   system_time
   212  page-faults

   0.082925631 seconds time elapsed

   0.0 seconds user
   0.08300 seconds sys

 Performance counter stats for '/sbin/modprobe test_kallsyms_b':

87,776,380 ns   duration_time
86,678,000 ns   system_time
   213  page-faults

   0.087776380 seconds time elapsed

   0.0 seconds user
   0.086678000 seconds sys

 Performance counter stats for '/sbin/modprobe test_kallsyms_b':

85,947,900 ns   duration_time
82,006,000 ns   system_time
   212  page-faults

   0.085947900 seconds time elapsed

   0.0 seconds user
   0.082006000 seconds sys

Signed-off-by: Helge Deller 
[mcgrof: ran kallsyms selftests on x86_64]
Signed-off-by: Luis Chamberlain 
---
 scripts/module.lds.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index bf5bcf2836d8..b00415a9ff27 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -15,10 +15,10 @@ SECTIONS {
*(.discard.*)
}
 
-   __ksymtab   0 : { *(SORT(___ksymtab+*)) }
-   __ksymtab_gpl   0 : { *(SORT(___ksymtab_gpl+*)) }
-   __kcrctab   0 : { *(SORT(___kcrctab+*)) }
-   __kcrctab_gpl   0 : { *(SORT(___kcrctab_gpl+*)) }
+   __ksymtab   0 : ALIGN(8) { *(SORT(___ksymtab+*)) }
+   __ksymtab_gpl   0 : ALIGN(8) { *(SORT(___ksymtab_gpl+*)) }
+   __kcrctab   0 : ALIGN(4) { *(SORT(___kcrctab+*)) }
+   __kcrctab_gpl   0 : ALIGN(4) { *(SORT(___kcrctab_gpl+*)) }
 
.ctors  0 : ALIGN(8) { *(SORT(.ctors.*)) *(.ctors) }
.init_array 0 : ALIGN(8) { *(SORT(.init_array.*)) 
*(.init_array) }
-- 
2.42.0

Re: [PATCH 4/4] modules: Add missing entry for __ex_table

2024-01-29 Thread Luis Chamberlain

On Wed, Nov 22, 2023 at 11:18:14PM +0100, del...@kernel.org wrote:
> From: Helge Deller 
> 
> The entry for __ex_table was missing, which may make __ex_table
> become 1- or 2-byte aligned in modules.
> Add the entry to ensure it gets 32-bit aligned.
> 
> Signed-off-by: Helge Deller 
> Cc:  # v6.0+

Cc'ing stable was overkill, I'll remove it.

  Luis

> ---
>  scripts/module.lds.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/scripts/module.lds.S b/scripts/module.lds.S
> index b00415a9ff27..488f61b156b2 100644
> --- a/scripts/module.lds.S
> +++ b/scripts/module.lds.S
> @@ -26,6 +26,7 @@ SECTIONS {
>   .altinstructions0 : ALIGN(8) { KEEP(*(.altinstructions)) }
>   __bug_table 0 : ALIGN(8) { KEEP(*(__bug_table)) }
>   __jump_table0 : ALIGN(8) { KEEP(*(__jump_table)) }
> + __ex_table  0 : ALIGN(4) { KEEP(*(__ex_table)) }
>  
>   __patchable_function_entries : { *(__patchable_function_entries) }
>  
> -- 
> 2.41.0
> 
>

Re: [RESEND PATCH v2] modules: wait do_free_init correctly

2024-01-29 Thread Luis Chamberlain

On Mon, Jan 29, 2024 at 10:03:04AM +0800, Changbin Du wrote:
> The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves
> do_free_init() into a global workqueue instead of call_rcu(). So now
> rcu_barrier() can not ensure that do_free_init has completed. We should
> wait it via flush_work().
> 
> Without this fix, we still could encounter false positive reports in
> W+X checking, and rcu synchronization is unnecessary.

You didn't answer my question, which should be documented in the commit log.

Does this mean we never freed modules init because of this? If so then
your commit log should clearly explain that. It should also explain that
if true (you have to verify) then it means we were no longer saving
the memory we wished to save, and that is important for distributions
which do want to save anything on memory. You may want to do a general
estimate on how much that means these days on any desktop / server.

  Luis

Re: [PATCH 2/4] modules: Ensure 64-bit alignment on __ksymtab_* sections

2024-01-22 Thread Luis Chamberlain

On Mon, Jan 22, 2024 at 05:47:49PM +0100, Helge Deller wrote:
> On 1/22/24 17:10, Luis Chamberlain wrote:
> > 
> > It's within the noise for x86_64, but given what you suggest
> > for parisc where it is much more expensive, we should see a non-noise
> > delta. Even just time on loading the module should likely result in
> > a considerable delta than on x86_64. You may just need to play a bit
> > with the default values at build time.
> 
> I don't know if it will be a "considerable" amount of time.

There are variables which you can tune, so given what you suggest
it should be possible to get to that spot rather easily with a few
changes to the default values I think.

> > If you don't feel like doing that test that's fine too, we can just
> > ignore that.
> 
> I can do that test, but I won't have time for that in the next few weeks...

I would appreciate it!

  Luis

Re: [PATCH 2/4] modules: Ensure 64-bit alignment on __ksymtab_* sections

2024-01-22 Thread Luis Chamberlain

On Sat, Dec 30, 2023 at 08:33:24AM +0100, Helge Deller wrote:
> Your selftest code is based on perf.
> AFAICS we don't have perf on parisc/hppa, 

I see!

> so I can't test your selftest code
> on that architecture.
> I assume you tested on x86, where the CPU will transparently take care of
> unaligned accesses. This is probably why the results are within
> the noise.
> But on some platforms the CPU raises an exception on unaligned accesses
> and jumps into special exception handler assembler code inside the kernel.
> This is much more expensive than on x86, which is why we track on parisc
> in /proc/cpuinfo counters on how often this exception handler is called:
> IRQ:   CPU0   CPU1
>   3:   1332  0 SuperIO  ttyS0
>   7:1270013  0 SuperIO  pata_ns87415
>  64:  320023012  320021431 CPU  timer
>  65:   17080507   20624423 CPU  IPI
> UAH:   10948640  58104   Unaligned access handler traps
> 
> This "UAH" field could theoretically be used to extend your selftest.

Nice!

> But is it really worth it? The outcome is very much architecture and CPU
> specific, maybe it's just within the noise as you measured.

It's within the noise for x86_64, but given what you suggest
for parisc where it is much more expensive, we should see a non-noise
delta. Even just time on loading the module should likely result in
a considerable delta than on x86_64. You may just need to play a bit
with the default values at build time.

> IMHO we should always try to natively align structures, and if we see
> we got it wrong in kernel code, we should fix it.

This was all motivated by the first review criteria of these patches
as if they were stable worthy or not. Even if we don't consider them
stable material, given the test is now written and easily extended to
test on parisc with just timing information and UAH I think it would
be nice to have this data for a few larger default factor values so we
can compare against x86_64 while we're at it.

If you don't feel like doing that test that's fine too, we can just
ignore that. I'll still apply the patches but, I figured I'd ask to
collect information while the test was already written and it should
now be easy to compare / contrast differences.

  Luis

[GIT PULL] Modules changes for v6.8-rc1

2024-01-09 Thread Luis Chamberlain

The following changes since commit ceb6a6f023fd3e8b07761ed900352ef574010bcb:

  Linux 6.7-rc6 (2023-12-17 15:19:28 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ 
tags/modules-6.8-rc1

for you to fetch changes up to 4515d08a742c76612b65d2f47a87d12860519842:

  kernel/module: improve documentation for try_module_get() (2023-12-21 
10:26:14 -0800)


Modules changes for v6.8-rc1

Just one cleanup and one documentation improvement change. No functional
changes. However, this has been tested on linux-next for over 1 month.


Kevin Hao (1):
  module: Remove redundant TASK_UNINTERRUPTIBLE

Marco Pagani (1):
  kernel/module: improve documentation for try_module_get()

 include/linux/module.h | 2 +-
 kernel/module/dups.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Re: [PATCH 1/4] linux/export: Fix alignment for 64-bit ksymtab entries

2023-12-22 Thread Luis Chamberlain

On Fri, Dec 22, 2023 at 04:01:30PM +0900, Masahiro Yamada wrote:
> On Fri, Dec 22, 2023 at 3:08 PM Luis Chamberlain  wrote:
> >
> > On Thu, Dec 21, 2023 at 10:07:13PM -0800, Luis Chamberlain wrote:
> > >
> > > If we want to go bananas we could even get a graph of size of modules
> >
> > Sorry I meant size of number of symbols Vs cost.
> >
> >  Luis
> 
> 
> 
> But, 1/4 is really a bug-fix, isn't it?

Ah you mean a regression fix, yeah sure, thanks I see !

 Luis

Re: [PATCH 2/4] modules: Ensure 64-bit alignment on __ksymtab_* sections

2023-12-22 Thread Luis Chamberlain

On Fri, Dec 22, 2023 at 01:13:26PM +0100, Helge Deller wrote:
> Hi Luis,
> 
> On 12/22/23 06:59, Luis Chamberlain wrote:
> > On Wed, Nov 22, 2023 at 11:18:12PM +0100, del...@kernel.org wrote:
> > > From: Helge Deller 
> > > 
> > > On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
> > > (e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores
> > > 64-bit pointers into the __ksymtab* sections.
> > > Make sure that those sections will be correctly aligned at module link 
> > > time,
> > > otherwise unaligned memory accesses may happen at runtime.
> > 
> > The ramifications are not explained there.
> 
> What do you mean exactly by this?

You don't explain the impact of not applying this patch.

> > You keep sending me patches with this and we keep doing a nose dive
> > on this. It means I have to do more work.
> Sorry about that. Since you are mentioned as maintainer for modules I
> thought you are the right contact.

I am certaintly the right person to send this patch to, however, I am
saying that given our previous dialog I would like the commit log to
describe a bit more detail of a few things:

 * how you found this
 * what are the impact of not applying it

Specially if you are going to be sending patches right before the
holidays with a Cc stable fix annotation. This gets me on hight alert
and I have to put things down to see how really critical this is so to
decide to fast track this to Linus or not.

> Furthermore, this patch is pretty small and trivial.

Many patches can be but some can break things and some can be small but
also improve things critically, for example if we are aware of broken
exception handlers.

> And I had the impression that people understand that having unaligned
> structures is generally a bad idea as it usually always impacts performance
> (although the performance penalty in this case isn't critical at all,
> as we are not on hot paths).

There are two aspects to this, one is the critical nature which is
implied by your patch which pegs it as a stable fix, given you prior
patches about this it leaves me wondering if it is fixing some crashes
on some systems given faulty exception handlers.

The performance thing is also subjective, but it could not be subjective
by doing some very trivial tests as I suggested. Such a test would also
help as we lack specific selftsts for this case and we can grow it later
with other things. I figured I'd ask you to try it, since *you* keep
sending patches about misalignments on module sections so I figured
you must be seeing something others are not, and so you must care.

> > Thoughts?
> 
> I really appreciate your thoughts here!
> 
> But given the triviality of this patch, I personally think you are
> proposing a huge academic investment in time and resources with no real 
> benefit.
> On which architecture would you suggest to test this?
> What would be the effective (really new) outcome?
> If the performance penalty is zero or close to zero, will that imply
> that such a patch isn't useful?
> Please also keep in mind that not everyone gets paid to work on the kernel,
> so I have neither the time nor the various architectures to test on.

I think you make this seem so difficult, but I understand it could seem
that way. I've attached at the end of this email a patch that does just
this then to help.

> So, honestly I don't see a real reason why it shouldn't be applied...

Like I said, you Cc'd stable as a fix, as a maintainer it is my job to
verify how critical this is and ask for more details about how you found
it and evaluate the real impact. Even if it was not a stable fix I tend
to ask this for patches, even if they are trivial.

> > > Signed-off-by: Helge Deller 
> > > Cc:  # v6.0+
> > 
> > That's a stretch without any data, don't you think?
> 
> Yes. No need to push such a patch to stable series.

OK, can you extend the patch below with something like:

perf stat --repeat 100 --pre 'modprobe -r b a b c' -- 
./tools/testing/selftests/module/find_symbol.sh

And test before and after?

I ran a simple test as-is and the data I get is within noise, and so
I think we need the --repeat 100 thing.

---
before:
sudo ./tools/testing/selftests/module/find_symbol.sh 

 Performance counter stats for '/sbin/modprobe test_kallsyms_b':

81,956,206 ns   duration_time   

81,883,000 ns   system_time 

   210  page-faults 

   0.081956206 seconds time elapsed

   0.0 seconds user
   0.081

Re: [PATCH 1/4] linux/export: Fix alignment for 64-bit ksymtab entries

2023-12-21 Thread Luis Chamberlain

On Thu, Dec 21, 2023 at 10:07:13PM -0800, Luis Chamberlain wrote:
> 
> If we want to go bananas we could even get a graph of size of modules

Sorry I meant size of number of symbols Vs cost.

 Luis

Re: [PATCH 1/4] linux/export: Fix alignment for 64-bit ksymtab entries

2023-12-21 Thread Luis Chamberlain

On Fri, Dec 22, 2023 at 01:01:23AM +0900, Masahiro Yamada wrote:
> On Thu, Dec 21, 2023 at 7:22 PM Masahiro Yamada  wrote:
> >
> > On Thu, Nov 23, 2023 at 7:18 AM  wrote:
> > >
> > > From: Helge Deller 
> > >
> > > An alignment of 4 bytes is wrong for 64-bit platforms which don't define
> > > CONFIG_HAVE_ARCH_PREL32_RELOCATIONS (which then store 64-bit pointers).
> > > Fix their alignment to 8 bytes.
> > >
> > > Signed-off-by: Helge Deller 
> >
> >
> > This is correct.
> >
> > Acked-by: Masahiro Yamada 
> >
> > Please add
> >
> >
> > Fixes: ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")
> >
> >
> 
> 
> If there is no objection, I will pick this up
> to linux-kbuild/fixes.

The new selftests I've suggested should help get perf data to cover both
modules and built-in kernel symbols given find_symbol() will first hit
built-in symbols first before modules with one caveat: we'd want to extend
the selftest with a part which builds a module built-in with also tons
of other symbols.

So I'm all for you taking this but I don't think we need to rush for the
same reasons I mentioned in my reply to Helge.

I think it would be nice to get real perf data with perf stat as I
suggested, and include that in the commit logs. I think it would also be
useful to include a description about the fact that there is no real fix
and that the performance hit is all that happens as the architecture
just emulates the aligment. In the worst case, if exception handlers
are broken we could crash but that is rare although it does happen.

If we want to go bananas we could even get a graph of size of modules
Vs cost on misaligment as a relationship with time. Without this, frankly
cost on "performance" is artificial.

Thoughts?

  Luis

Re: [PATCH 2/4] modules: Ensure 64-bit alignment on __ksymtab_* sections

2023-12-21 Thread Luis Chamberlain

On Wed, Nov 22, 2023 at 11:18:12PM +0100, del...@kernel.org wrote:
> From: Helge Deller 
> 
> On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
> (e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores
> 64-bit pointers into the __ksymtab* sections.
> Make sure that those sections will be correctly aligned at module link time,
> otherwise unaligned memory accesses may happen at runtime.

The ramifications are not explained there. You keep sending me patches
with this and we keep doing a nose dive on this. It means I have to do
more work. So as I had suggested with your patch which I merged in
commit 87c482bdfa79 ("modules: Ensure natural alignment for
.altinstructions and __bug_table sections") please clarify the
impact of not merging this patch. Also last time you noticed the
misalignment due to a faulty exception handler, please mention how
you found this out now.

And since this is not your first patch on the exact related subject
I'd appreciate if you can show me perf stat results differences between
having and not having this patch merged. Why? Because we talk about
a performance penalthy, but we are not saying how much, and since this
is an ongoing thing, we might as well have a tool belt with ways to
measure such performance impact to bring clarity and value to this
and future related patches.

> The __kcrctab* sections store 32-bit entities, so use ALIGN(4) for those.

I've given some thought about how to test this. Sadly perf kallsysms
just opens the /proc/kallsysms file, but that's fine, we need our own
test.

I think a 3 new simple modules selftest would do it and running perf
stat on it. One module, let us call it module A which constructs its own
name space prefix for its exported symbols and has tons of silly symbols
for arbitrary data, whatever. We then have module B which refers to a
few arbitrary symbols from module A, hopefully spread out linearly, so
if module A had 10,000 symbols, we'd have module A refer to a symbol
ever 1,000 symbols. Finally we want a symbol C which has say, 50,000
symbols all of which will not be used at all by the first two modules,
but the selftest will load module C first, prior to calling modprobe B.

We'll stress test this way two calls which use find_symbol():

1) Upon load of B it will trigger simplify_symbols() to look for the
symbol it uses from the module A with tons of symbols. That's an
indirect way for us to call resolve_symbol_wait() from module A without
having to use symbol_get() which want to remove as exported as it is
just a hack which should go away. Our goal is for us to test
resolve_symbol() which will call find_symbol() and that will eventually
look for the symbol on module A with:

  find_exported_symbol_in_section()

That uses bsearch() so a binary search for the symbol and we'd end up
hitting the misalignments here. Binary search will at worst be O(log(n))
and so the only way to aggreviate the search will be to add tons of
symbols to A, and have B use a few of them.

2) When you load B, userspace will at first load A as depmod will inform
userspace A goes before B. Upon B's load towards the end right before
we call module B's init routine we get complete_formation() called on
the module. That will first check for duplicate symbols with the call
to verify_exported_symbols(). That is when we'll force iteration on
module C's insane symbol list.

The selftests just runs

perf stat -e pick-your-poison-for-misalignments 
tools/testing/selftests/kmod/ksymtab.sh

Where ksymtab.sh is your new script which calls:

modprobe C
modprobe B

I say pick-your-poison-for-misalignments because I am not sure what is
best here.

Thoughts?

> Signed-off-by: Helge Deller 
> Cc:  # v6.0+

That's a stretch without any data, don't you think?

 Luis

Re: [PATCH] kernel/module: improve documentation for try_module_get()

2023-12-21 Thread Luis Chamberlain

On Thu, Dec 21, 2023 at 05:58:47PM +0100, Marco Pagani wrote:
> The sentence "this call will fail if the module is already being
> removed" is potentially confusing and may contradict the rest of the
> documentation. If one tries to get a module that has already been
> removed using a stale pointer, the kernel will crash.
> 
> Signed-off-by: Marco Pagani 

Thanks, patch applied!

  Luis

Re: [PATCH 0/4] Section alignment issues?

2023-12-20 Thread Luis Chamberlain

On Tue, Dec 19, 2023 at 01:26:49PM -0800, Luis Chamberlain wrote:
> On Wed, Nov 22, 2023 at 11:18:10PM +0100, del...@kernel.org wrote:
> > From: Helge Deller 
> > My questions:
> > - Am I wrong with my analysis?
> 
> This would typically of course depend on the arch, but whether it helps
> is what I would like to see with real numbers rather then speculation.
> Howeer, I don't expect some of these are hot paths except maybe the
> table lookups. So could you look at some perf stat differences
> without / with alignment ? Other than bootup live patching would be
> a good test case. We have selftest for modules, the script in selftests
> tools/testing/selftests/kmod/kmod.sh is pretty aggressive, but the live
> patching tests might be better suited.
> 
> > - What does people see on other architectures?
> > - Does it make sense to add a compile- and runtime-check, like the patch 
> > below, to the kernel?
> 
> The chatty aspects really depend on the above results.
> 
> Aren't there some archs where an unaligned access would actually crash?
> Why hasn't that happened?

I've gone down through memory lane and we have discussed this before.

It would seem this misalignment should not affect performance, and this
should not be an issue unless you have a buggy exception hanlder. We
actually ran into one before. Please refer to merge commit

e74acdf55da6649dd30be5b621a93b71cbe7f3f9

  Luis

Re: [PATCH] modules: wait do_free_init correctly

2023-12-20 Thread Luis Chamberlain

On Wed, Dec 20, 2023 at 01:27:51PM +0800, Changbin Du wrote:
> On Tue, Dec 19, 2023 at 01:52:03PM -0800, Luis Chamberlain wrote:
> > On Tue, Dec 19, 2023 at 12:51:51PM -0800, Andrew Morton wrote:
> > > On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du  
> > > wrote:
> > > 
> > > > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves
> > > > do_free_init() into a global workqueue instead of call_rcu(). So now
> > > > we should wait it via flush_work().
> > > 
> > > What are the runtime effects of this change?
> > 
> > Indeed that's needed given how old this culprit commit is:
> > 
> > git describe --contains 1a7b7d922081
> > v5.2-rc1~192^2~5
> > 
> > Who did this work and for what reason? What triggered this itch?
> >
> Seems the waiting was introduced by commit ae646f0b9ca ("init: fix false 
> positives
> in W+X checking").
> 
> As what I have observed, mark_readonly() is only invoked by the first user 
> mode
> thread function kernel_init(), which is before userspace /init. So is it real
> possible we have loaded modules at this point?

Are you saying we don't free any module inits at all then? I asked a lot
of questions and your answers seem slim.

How did you find this?
What actual impact does this have without the patch?

The commit must document this.

  Luis

Re: [PATCH] modules: wait do_free_init correctly

2023-12-19 Thread Luis Chamberlain

On Tue, Dec 19, 2023 at 12:51:51PM -0800, Andrew Morton wrote:
> On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du  wrote:
> 
> > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves
> > do_free_init() into a global workqueue instead of call_rcu(). So now
> > we should wait it via flush_work().
> 
> What are the runtime effects of this change?

Indeed that's needed given how old this culprit commit is:

git describe --contains 1a7b7d922081
v5.2-rc1~192^2~5

Who did this work and for what reason? What triggered this itch?

Is it perhaps for an out of tree driver that did something funky
on its module exit?

As per Documentation/RCU/rcubarrier.rst rcu_barrier will ensure the
callbacks complete, so interms of determinism both mechanisms will
have waited for the free. It seems we're now just limiting the scope.

This could also mean initialization grew used to having RCU calls on
init complete at this point in time, even for modules, and so localizing
this wait may now also introduce other unexpected behaviour.

  Luis

Re: [PATCH 0/4] Section alignment issues?

2023-12-19 Thread Luis Chamberlain

On Wed, Nov 22, 2023 at 11:18:10PM +0100, del...@kernel.org wrote:
> From: Helge Deller 
> My questions:
> - Am I wrong with my analysis?

This would typically of course depend on the arch, but whether it helps
is what I would like to see with real numbers rather then speculation.
Howeer, I don't expect some of these are hot paths except maybe the
table lookups. So could you look at some perf stat differences
without / with alignment ? Other than bootup live patching would be
a good test case. We have selftest for modules, the script in selftests
tools/testing/selftests/kmod/kmod.sh is pretty aggressive, but the live
patching tests might be better suited.

> - What does people see on other architectures?
> - Does it make sense to add a compile- and runtime-check, like the patch 
> below, to the kernel?

The chatty aspects really depend on the above results.

Aren't there some archs where an unaligned access would actually crash?
Why hasn't that happened?

  Luis

Re: [PATCH] init: move THIS_MODULE from to

2023-12-04 Thread Luis Chamberlain

On Sun, Nov 26, 2023 at 04:19:14PM +0900, Masahiro Yamada wrote:
> Commit f50169324df4 ("module.h: split out the EXPORT_SYMBOL into
> export.h") appropriately separated EXPORT_SYMBOL into 
> because modules and EXPORT_SYMBOL are orthogonal; modules are symbol
> consumers, while EXPORT_SYMBOL are used by symbol providers, which
> may not be necessarily a module.
> 
> However, that commit also relocated THIS_MODULE. As explained in the
> commit description, the intention was to define THIS_MODULE in a
> lightweight header, but I do not believe  was the
> suitable location because EXPORT_SYMBOL and THIS_MODULE are unrelated.
> 
> Move it to another lightweight header, . The reason for
> choosing  is to make  self-contained
> without relying on  incorrectly including
> .
> 
> With this adjustment, the role of  becomes clearer as
> it only defines EXPORT_SYMBOL.
> 
> Signed-off-by: Masahiro Yamada 

Reviewed-by: Luis Chamberlain 

Do you want this this to go through modules-next or your tree? I'm fine
it goes either way.

  Luis

Re: [PATCH 2/3] modpost: Extended modversion support

2023-11-16 Thread Luis Chamberlain

On Wed, Nov 15, 2023 at 06:50:10PM +, Matthew Maurer wrote:
> Adds a new format for modversions which stores each field in a separate
> elf section.

The "why" is critical and not mentioned. And I'd like to also see
documented this with foresight, if Rust needed could this be used
in the future for other things?

Also please include folks CC'd in *one* patch to *all* patches as
otherwise we have no context.

> This initially adds support for variable length names, but
> could later be used to add additional fields to modversions in a
> backwards compatible way if needed.
> 
> Adding support for variable length names makes it possible to enable
> MODVERSIONS and RUST at the same time.
> 
> Signed-off-by: Matthew Maurer 
> ---
>  arch/powerpc/kernel/module_64.c | 24 +-

Why was only powerpc modified? If the commit log explained this it would
make it easier for review.

> diff --git a/kernel/module/internal.h b/kernel/module/internal.h
> index c8b7b4dcf782..0c188c96a045 100644
> --- a/kernel/module/internal.h
> +++ b/kernel/module/internal.h
> @@ -80,7 +80,7 @@ struct load_info {
>   unsigned int used_pages;
>  #endif
>   struct {
> - unsigned int sym, str, mod, vers, info, pcpu;
> + unsigned int sym, str, mod, vers, info, pcpu, vers_ext_crc, 
> vers_ext_name;

We might as well modify this in a preliminary patch to add each new
unsinged int in a new line, so that it is easier to blame when each new
entry gets added. It should not grow the size of the struct at all but
it would make futur extensions easier to review what is new and git
blame easier to spot when something was added.

Although we don't use this extensively today this can easily grow for
convenience and making code easier to read.

> diff --git a/kernel/module/version.c b/kernel/module/version.c
> index 53f43ac5a73e..93d97dad8c77 100644
> --- a/kernel/module/version.c
> +++ b/kernel/module/version.c
> @@ -19,11 +19,28 @@ int check_version(const struct load_info *info,
>   unsigned int versindex = info->index.vers;
>   unsigned int i, num_versions;
>   struct modversion_info *versions;
> + struct modversion_info_ext version_ext;
>  
>   /* Exporting module didn't supply crcs?  OK, we're already tainted. */
>   if (!crc)
>   return 1;
>  
> + /* If we have extended version info, rely on it */
> + if (modversion_ext_start(info, _ext) >= 0) {

There are two things we need to do to make processing modules easier:

  1) ELF validation
  2) Once checked then process the information

We used to have this split up but also had a few places which did both
1) and 2) together. This was wrong and so I want to keep things tidy
and ensure we do things which validate the ELF separate. To that
end please put the checks to validate the ELF first so that we report
to users with a proper error/debug check in case the ELF is wrong,
this enables futher debug checks for that to be done instead of
confusing users who end up scratching their heads why something
failed.

So please split up the ELF validation check and put that into
elf_validity_cache_copy() which runs *earlier* than this.

Then *if* if has this, you just process it. Please take care to be
very pedantic in the elf_validity_cache_copy() and extend the checks
you have for validation in modversion_ext_start() and bring them to
elf_validity_cache_copy() with perhaps *more* stuff which does any
insane checks to verify it is 100% correct.

> + do {
> + if (strncmp(version_ext.name.value, symname,
> + version_ext.name.end - 
> version_ext.name.value) != 0)
> + continue;
> +
> + if (*version_ext.crc.value == *crc)
> + return 1;
> + pr_debug("Found checksum %X vs module %X\n",
> +  *crc, *version_ext.crc.value);
> + goto bad_version;
> + } while (modversion_ext_advance(_ext) == 0);

Can you do a for_each_foo()) type loop here instead after validation?
Because the validation would ensure your loop is bounded then. Look at
for_each_mod_mem_type() for inspiration.

> + goto broken_toolchain;

The broken toolchain thing would then be an issue reported in the
ELF validation.

> @@ -87,6 +105,65 @@ int same_magic(const char *amagic, const char *bmagic,
>   return strcmp(amagic, bmagic) == 0;
>  }
>  
> +#define MODVERSION_FIELD_START(sec, field) \
> + field.value = (typeof(field.value))sec.sh_addr; \
> + field.end = field.value + sec.sh_size
> +
> +ssize_t modversion_ext_start(const struct load_info *info,
> +  struct modversion_info_ext *start)
> +{
> + unsigned int crc_idx = info->index.vers_ext_crc;
> + unsigned int name_idx = info->index.vers_ext_name;
> + Elf_Shdr *sechdrs = info->sechdrs;
> +
> + // Both of these fields are needed for this to be

[GIT PULL] Modules changes for v6.7-rc1

2023-11-01 Thread Luis Chamberlain

The following changes since commit 7d461b291e65938f15f56fe58da2303b07578a76:

  Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm 
(2023-11-01 06:28:35 -1000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/ 
tags/modules-6.7-rc1

for you to fetch changes up to ea0b0bcef4917a2640ecc100c768b8e785784834:

  module: Annotate struct module_notes_attrs with __counted_by (2023-11-01 
13:07:32 -0700)


Modules changes for v6.7-rc1

The only thing worth highligthing is that gzip moves to use vmalloc() instead of
kmalloc just as we had a fix for this for zstd on v6.6-rc1. The rest is regular
house keeping, keeping things neat, tidy, and boring.

Oh and this has been on linux-next for over a month.


Andrea Righi (1):
  module/decompress: use vmalloc() for gzip decompression workspace

Kees Cook (2):
  module: Clarify documentation of module_param_call()
  module: Annotate struct module_notes_attrs with __counted_by

Luis Chamberlain (1):
  MAINTAINERS: add include/linux/module*.h to modules

Tiezhu Yang (2):
  module: Make is_mapping_symbol() return bool
  module: Make is_valid_name() return bool

Zhu Mao (1):
  module: Fix comment typo

 MAINTAINERS   | 2 +-
 include/linux/module_symbol.h | 2 +-
 include/linux/moduleparam.h   | 6 +-
 kernel/module/decompress.c| 4 ++--
 kernel/module/stats.c | 2 +-
 kernel/module/sysfs.c | 2 +-
 scripts/mod/modpost.c | 4 ++--
 7 files changed, 13 insertions(+), 9 deletions(-)

Re: [PATCH 5/5] modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules

2023-10-18 Thread Luis Chamberlain

On Wed, Oct 18, 2023 at 07:31:46AM +0200, Christoph Hellwig wrote:
> On Wed, Oct 18, 2023 at 01:30:18AM +0100, David Woodhouse wrote:
> > 
> > But if we're going to tolerate the core kernel still exporting some
> > stuff with EXPORT_SYMBOL, why isn't OK for a GPL-licensed module do to
> > the same? Even an *in-tree* GPL-licensed module now can't export
> > functionality with EXPORT_SYMBOL and have it used with symbol_get().
> 
> Anything using symbol_get is by intent very deeply internal for tightly
> coupled modules working together, and thus not a non-GPL export.
> 
> In fact the current series is just a stepping stone.  Once some mess
> in the kvm/vfio integration is fixed up we'll require a new explicit
> EXPORT_SYMBOL variant as symbol_get wasn't ever intended to be used
> on totally random symbols not exported for use by symbol_get.

The later patches in the series also show we could resolves most
uses through Kconfig and at build time, it really begs the question
if we even need it for any real valid uses.

  Luis

Re: [PATCH v3] module: Add CONFIG_MODULE_DISABLE_INIT_FREE option

2023-10-12 Thread Luis Chamberlain

On Thu, Oct 12, 2023 at 09:50:27AM -0700, Luis Chamberlain wrote:
> > In the kernel, selecting the CONFIG_MODULE_DISABLE_INIT_FREE
> > option disables the asynchronous freeing of init sections.
> 
> No it does not.

I take it back, your patch effectively only does this.

  Luis

Re: [PATCH v4] module: Add CONFIG_MODULE_DISABLE_INIT_FREE option

2023-10-12 Thread Luis Chamberlain

On Thu, Oct 12, 2023 at 07:17:19AM +0530, Joey Jiao wrote:
>  
> +config MODULE_DISABLE_INIT_FREE
> + bool "Disable freeing of init sections"
> + default n
> + help
> +   By default, kernel will free init sections after module being fully
> +   loaded.
> +
> +   MODULE_DISABLE_INIT_FREE allows users to prevent the freeing of init
> +   sections. This option is particularly helpful for syzkaller fuzzing,
> +   ensuring that the module consistently loads into the same address
> +   across reboots.

How and why does not free'ing init help with syzkaller exactly? I don't
see the relationship between not free'ing init and ensuring th emodule
loads into the same address. There could be many things which could
incur an address gets allocated from a module at another location which
a module can take. I cannot fathom how this simple toggle could ensure
modules following the address allocations accross reboots. That seems
like odd chance, not something actually deterministic.

> +
>  endif # MODULES
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 98fedfdb8db5..0f242b7b29fe 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2593,7 +2593,8 @@ static noinline int do_init_module(struct module *mod)
>* be cleaned up needs to sync with the queued work - ie
>* rcu_barrier()
>*/
> - if (llist_add(>node, _free_list))
> + if (llist_add(>node, _free_list) &&
> + !IS_ENABLED(CONFIG_MODULE_DISABLE_INIT_FREE))
>   schedule_work(_free_wq);

Re: [PATCH v3] module: Add CONFIG_MODULE_DISABLE_INIT_FREE option

2023-10-12 Thread Luis Chamberlain

On Thu, Oct 12, 2023 at 07:10:11AM +0530, Joey Jiao wrote:
> To facilitate syzkaller test, it's essential for the module to retain the same
> address across reboots.

Why?

> In userspace, the execution of modprobe commands must
> occur sequentially.

Why?

> In the kernel, selecting the CONFIG_MODULE_DISABLE_INIT_FREE
> option disables the asynchronous freeing of init sections.

No it does not.

> Signed-off-by: Joey Jiao 
> ---
>  kernel/module/Kconfig | 8 
>  kernel/module/main.c  | 5 +++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index 33a2e991f608..1cdbee4c51de 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -389,4 +389,12 @@ config MODULES_TREE_LOOKUP
>   def_bool y
>   depends on PERF_EVENTS || TRACING || CFI_CLANG
>  
> +config MODULE_DISABLE_INIT_FREE
> + bool "Disable freeing of init sections"
> + default n
> + help
> +   Allows users to prevent the freeing of init sections. This option is
> +   particularly helpful for syzkaller fuzzing, ensuring that the module
> +   consistently loads into the same address across reboots.
> +
>  endif # MODULES
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 98fedfdb8db5..a5210b90c078 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2593,8 +2593,9 @@ static noinline int do_init_module(struct module *mod)
>* be cleaned up needs to sync with the queued work - ie
>* rcu_barrier()
>*/
> - if (llist_add(>node, _free_list))
> - schedule_work(_free_wq);
> + if (llist_add(>node, _free_list) &&
> + !IS_ENABLED(CONFIG_MODULE_DISABLE_INIT_FREE))
> + schedule_work(_free_wq);

llist_add() returns true if the list was empty prior to adding the
entry, so the functionality you are adding makes no sense with the
commit log in any way shape or form.

  Luis

Re: [PATCH v2] module: Add CONFIG_MODULE_LOAD_IN_SEQUENCE option

2023-10-11 Thread Luis Chamberlain

On Wed, Oct 11, 2023 at 01:14:38PM +0530, Joey Jiao wrote:
> When modprobe cmds are executed one by one, the final loaded modules
> are not in fixed sequence as expected.
> 
> Add the option to make sure modules are in fixed sequence across reboot.
> 
> Signed-off-by: Joey Jiao 
> ---
>  kernel/module/Kconfig | 11 +++
>  kernel/module/main.c  |  3 ++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index 33a2e991f608..b45a45f31d6d 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -389,4 +389,15 @@ config MODULES_TREE_LOOKUP
>   def_bool y
>   depends on PERF_EVENTS || TRACING || CFI_CLANG
>  
> +config MODULE_LOAD_IN_SEQUENCE
> + bool "Load module in sequence"
> + default n
> + help
> +   By default, modules are loaded in random sequence depending on when 
> modprobe
> +   is executed.
> +
> +   This option allows modules to be loaded in sequence if modprobe cmds 
> are
> +   executed one by one in sequence. This option is helpful during 
> syzkaller fuzzing
> +   to make sure module is loaded into fixed address across device reboot.
> +
>  endif # MODULES
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 98fedfdb8db5..e238a31d09eb 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2594,7 +2594,8 @@ static noinline int do_init_module(struct module *mod)
>* rcu_barrier()
>*/
>   if (llist_add(>node, _free_list))
> - schedule_work(_free_wq);
> + if (!IS_ENABLED(CONFIG_MODULE_LOAD_IN_SEQUENCE)) {
> + schedule_work(_free_wq);
>  

As Christoph suggested the rationale for something like this needs to be
clearly spelled out in the commit log and if so valuable it should be
a default. The commit log and even the Kconfig description do little
to justify any rationale for this.

  Luis

Re: [PATCH] module: Add CONFIG_MODULE_LOAD_IN_SEQUENCE option

2023-10-10 Thread Luis Chamberlain

Please find a good email client to reply to patches.

On Wed, Oct 11, 2023 at 01:57:58AM +, Joey Jiao (QUIC) wrote:
> Hi Luis,
> 
> > How is ignoring an error ensuring ordering?
> The change is just to disable the schedule_work.

That's different and can be made clearer. Try:

if (!IS_ENABLED(CONFIG_FOO))
schedule_stuff

> > Why are you making this only now be called with this new kconfig option?
> This sequence loading is especially helpful for syzkaller coverage decoding.
> When kaslr is disabled, address inside core kernel is fixed, so syzkaller can 
> always get right function/line number from addr2line.
> But module address keeps change across rebooting, in first booting, it might 
> be loaded at X1, and at X2 after reboot, and at X3 after another reboot.
> In this way, syzkaller just can't decode correctly for module address. And 
> syzkaller currently uses PC and branch info for coverage guided things.
> 
> There was a discussion previously here 
> https://groups.google.com/g/syzkaller/c/1Pnm_BjrZO8/m/WOyAKx8ZAgAJ for 
> modprobe.

You are missing my point, you are disabling in effect a piece of code
where it was not before.

  Luis

Re: [PATCH v2 0/5] params: harden string ops and allocatio ops

2023-10-10 Thread Luis Chamberlain

On Mon, Oct 02, 2023 at 09:57:59AM -0700, Kees Cook wrote:
> On Mon, Oct 02, 2023 at 03:48:51PM +0300, Andy Shevchenko wrote:
> > A couple of patches are for get the string ops, used in the module,
> > slightly harden. On top a few cleanups.
> > 
> > Since the main part is rather hardening, I think the Kees' tree is
> > the best fit for the series, but I'm open for another option(s).
> > 
> > Changelog v2:
> > - dropped the s*printf() --> sysfs_emit() conversion as it revealed
> >   an issue, i.e. reuse getters with non-page-aligned pointer, which
> >   would be addressed separately
> > - added cover letter and clarified the possible route for the series
> >   (Luis)
> > 
> > Andy Shevchenko (5):
> >   params: Introduce the param_unknown_fn type
> >   params: Do not go over the limit when getting the string length
> >   params: Use size_add() for kmalloc()
> >   params: Sort headers
> >   params: Fix multi-line comment style
> 
> Seems like a nice bit of clean-up.
> 
> Reviewed-by: Kees Cook 

Reviewed-by: Luis Chamberlain 

  Luis

Re: [PATCH] module: Add CONFIG_MODULE_LOAD_IN_SEQUENCE option

2023-10-10 Thread Luis Chamberlain

On Mon, Oct 09, 2023 at 10:26:35AM +0530, Joey Jiao wrote:
> When modprobe cmds are executed one by one, the final loaded modules
> are not in fixed sequence as expected.
> 
> Add the option to make sure modules are in fixed sequence across reboot.
> 
> Signed-off-by: Joey Jiao 
> ---
>  kernel/module/Kconfig | 11 +++
>  kernel/module/main.c  |  6 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index 33a2e991f608..b45a45f31d6d 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -389,4 +389,15 @@ config MODULES_TREE_LOOKUP
>   def_bool y
>   depends on PERF_EVENTS || TRACING || CFI_CLANG
>  
> +config MODULE_LOAD_IN_SEQUENCE
> + bool "Load module in sequence"
> + default n
> + help
> +   By default, modules are loaded in random sequence depending on when 
> modprobe
> +   is executed.
> +
> +   This option allows modules to be loaded in sequence if modprobe cmds 
> are
> +   executed one by one in sequence. This option is helpful during 
> syzkaller fuzzing
> +   to make sure module is loaded into fixed address across device reboot.
> +
>  endif # MODULES
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 98fedfdb8db5..587fd84083ae 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2593,11 +2593,17 @@ static noinline int do_init_module(struct module *mod)
>* be cleaned up needs to sync with the queued work - ie
>* rcu_barrier()
>*/
> +#ifdef CONFIG_MODULE_LOAD_IN_SEQUENCE
> + llist_add(>node, _free_list);
> +#else
>   if (llist_add(>node, _free_list))
>   schedule_work(_free_wq);
> +#endif

How is ignoring an error ensuring ordering?

>   mutex_unlock(_mutex);
> +#ifdef CONFIG_MODULE_LOAD_IN_SEQUENCE
>   wake_up_all(_wq);
> +#endif

Why are you making this only now be called with this new kconfig option?

  Luis

Re: [PATCH v3 0/7] sysctl: Remove sentinel elements from arch

2023-10-10 Thread Luis Chamberlain

On Mon, Oct 02, 2023 at 01:30:35PM +0200, Joel Granados via B4 Relay wrote:
> V3:
> * Removed the ia64 patch to avoid conflicts with the ia64 removal
> * Rebased onto v6.6-rc4
> * Kept/added the trailing comma for the ctl_table arrays. This was a comment
>   that we received "drivers/*" patch set.
> * Link to v2: 
> https://lore.kernel.org/r/20230913-jag-sysctl_remove_empty_elem_arch-v2-0-d1bd13a29...@samsung.com

Thanks! I replaced the v2 with this v3 on sysctl-next.

  Luis

Re: [PATCH] module: Fix comment typo

2023-09-20 Thread Luis Chamberlain

On Sun, Jun 18, 2023 at 10:58:43PM +0800, zhumao...@208suo.com wrote:
> 
> Delete duplicated word in comment.
> 
> Signed-off-by: Zhu Mao 
> ---

Applied and pushed, next time please consider sending a slew of typo
fixes, just one word is a bit too much trouble for just one patch.

  Luis

Re: [PATCH 0/2] module: Do some small changes

2023-09-20 Thread Luis Chamberlain

On Fri, Jun 16, 2023 at 09:51:31AM +0800, Tiezhu Yang wrote:
> The first patch is suggested by Xi Zhang  offline,
> the second patch is inspired by the first patch, no functional changes.

Applied and pushed, thanks!

  Luis

Re: [PATCH] module/decompress: use vmalloc() for gzip decompression workspace

2023-09-20 Thread Luis Chamberlain

On Wed, Aug 30, 2023 at 05:58:20PM +0200, Andrea Righi wrote:
> Use a similar approach as commit a419beac4a07 ("module/decompress: use
> vmalloc() for zstd decompression workspace") and replace kmalloc() with
> vmalloc() also for the gzip module decompression workspace.
> 
> In this case the workspace is represented by struct inflate_workspace
> that can be fairly large for kmalloc() and it can potentially lead to
> allocation errors on certain systems:
> 
> $ pahole inflate_workspace
> struct inflate_workspace {
>   struct inflate_state   inflate_state;/* 0  9544 */
>   /* --- cacheline 149 boundary (9536 bytes) was 8 bytes ago --- */
>   unsigned char  working_window[32768]; /*  9544 32768 */
> 
>   /* size: 42312, cachelines: 662, members: 2 */
>   /* last cacheline: 8 bytes */
> };
> 
> Considering that there is no need to use continuous physical memory,
> simply switch to vmalloc() to provide a more reliable in-kernel module
> decompression.
> 
> Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing")
> Signed-off-by: Andrea Righi 

Applied, and pushed, thanks!

  Luis

[PATCH] MAINTAINERS: add include/linux/module*.h to modules

2023-09-20 Thread Luis Chamberlain

Use glob include/linux/module*.h to capture all module changes.

Suggested-by: Kees Cook 
Signed-off-by: Luis Chamberlain 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3be1bdfe8ecc..b78cf8dc7f16 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14289,7 +14289,7 @@ L:  linux-kernel@vger.kernel.org
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git 
modules-next
 F: include/linux/kmod.h
-F: include/linux/module.h
+F: include/linux/module*.h
 F: kernel/module/
 F: lib/test_kmod.c
 F: scripts/module*
-- 
2.39.2

Re: [PATCH v2 0/8] sysctl: Remove sentinel elements from arch

2023-09-20 Thread Luis Chamberlain

On Wed, Sep 13, 2023 at 11:10:54AM +0200, Joel Granados via B4 Relay wrote:
> V2:
> * Added clarification both in the commit messages and the coverletter as
>   to why this patch is safe to apply.
> * Added {Acked,Reviewed,Tested}-by from list
> * Link to v1: 
> https://lore.kernel.org/r/20230906-jag-sysctl_remove_empty_elem_arch-v1-0-3935d4854...@samsung.com

Thanks! I've merged this onto sysctl-next.

  Luis

Re: [RFC v2 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing

2021-04-20 Thread Luis Chamberlain

On Tue, Apr 20, 2021 at 01:59:03PM +0100, Christoph Hellwig wrote:
> > This also removes all the superflous freezer calls on all filesystems
> > as they are no longer needed as the VFS now performs filesystem
> > freezing/thaw if the filesystem has support for it. The filesystem
> > therefore is in charge of properly dealing with quiescing of the
> > filesystem through its callbacks.
> 
> Can you split that out from the main logic change?  Maybe even into one
> patch per file system?

The issue with this is that once you do the changes in pm to
freeze/suspend, if you leave the other changes in for the filesystems
freeze / resume will stall, so all this needs to be an atomic operation
if we want bisectable kernels.

  Luis

Re: [PATCH 2/2] fs/kernel_read_file: use usermodehelper_read_trylock() as a stop gap

2021-04-19 Thread Luis Chamberlain

On Fri, Apr 16, 2021 at 11:58:50PM +, Luis Chamberlain wrote:
> The endless wait for the read which the piece of hardware never got
> stalls resume as sync calls are all asynchronous.

Sorry, this should read:

The endless wait for the read, which the piece of hardware never got,
stalls resume as all pm resume calls are serialized and synchronous.

  Luis

[RFC v2 4/6] fs: distinguish between user initiated freeze and kernel initiated freeze

2021-04-16 Thread Luis Chamberlain

Userspace can initiate a freeze call using ioctls. If the kernel decides
to freeze a filesystem later it must be able to distinguish if userspace
had initiated the freeze, so that it does not unfreeze it later
automatically on resume.

Likewise if the kernel is initiating a freeze on its own it should *not*
fail to freeze a filesystem if a user had already frozen it on our behalf.
This same concept applies to thawing, even if its not possible for
userspace to beat the kernel in thawing a filesystem. This logic however
has never applied to userspace freezing and thawing, two consecutive
userspace freeze calls will results in only the first one succeeding, so
we must retain the same behaviour in userspace.

This doesn't implement yet kernel initiated filesystem freeze calls,
this will be done in subsequent calls. This change should introduce
no functional changes, it just extends the definitions a frozen
filesystem to account for future kernel initiated filesystem freeze.

Signed-off-by: Luis Chamberlain 
---
 fs/super.c | 27 ++-
 include/linux/fs.h | 17 +++--
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 744b2399a272..53106d4c7f56 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -40,7 +40,7 @@
 #include 
 #include "internal.h"
 
-static int thaw_super_locked(struct super_block *sb);
+static int thaw_super_locked(struct super_block *sb, bool usercall);
 
 static LIST_HEAD(super_blocks);
 static DEFINE_SPINLOCK(sb_lock);
@@ -977,7 +977,7 @@ static void do_thaw_all_callback(struct super_block *sb)
down_write(>s_umount);
if (sb->s_root && sb->s_flags & SB_BORN) {
emergency_thaw_bdev(sb);
-   thaw_super_locked(sb);
+   thaw_super_locked(sb, false);
} else {
up_write(>s_umount);
}
@@ -1625,10 +1625,13 @@ static void sb_freeze_unlock(struct super_block *sb)
 }
 
 /* Caller takes lock and handles active count */
-static int freeze_locked_super(struct super_block *sb)
+static int freeze_locked_super(struct super_block *sb, bool usercall)
 {
int ret;
 
+   if (!usercall && sb_is_frozen(sb))
+   return 0;
+
if (!sb_is_unfrozen(sb))
return -EBUSY;
 
@@ -1673,7 +1676,10 @@ static int freeze_locked_super(struct super_block *sb)
 * For debugging purposes so that fs can warn if it sees write activity
 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 */
-   sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   if (usercall)
+   sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   else
+   sb->s_writers.frozen = SB_FREEZE_COMPLETE_AUTO;
return 0;
 }
 
@@ -1717,7 +1723,7 @@ int freeze_super(struct super_block *sb)
atomic_inc(>s_active);
 
down_write(>s_umount);
-   error = freeze_locked_super(sb);
+   error = freeze_locked_super(sb, true);
if (error) {
deactivate_locked_super(sb);
goto out;
@@ -1731,10 +1737,13 @@ int freeze_super(struct super_block *sb)
 EXPORT_SYMBOL(freeze_super);
 
 /* Caller deals with the sb->s_umount */
-static int __thaw_super_locked(struct super_block *sb)
+static int __thaw_super_locked(struct super_block *sb, bool usercall)
 {
int error;
 
+   if (!usercall && sb_is_unfrozen(sb))
+   return 0;
+
if (!sb_is_frozen(sb))
return -EINVAL;
 
@@ -1763,11 +1772,11 @@ static int __thaw_super_locked(struct super_block *sb)
 }
 
 /* Handles unlocking of sb->s_umount for you */
-static int thaw_super_locked(struct super_block *sb)
+static int thaw_super_locked(struct super_block *sb, bool usercall)
 {
int error;
 
-   error = __thaw_super_locked(sb);
+   error = __thaw_super_locked(sb, usercall);
if (error) {
up_write(>s_umount);
return error;
@@ -1787,6 +1796,6 @@ static int thaw_super_locked(struct super_block *sb)
 int thaw_super(struct super_block *sb)
 {
down_write(>s_umount);
-   return thaw_super_locked(sb);
+   return thaw_super_locked(sb, true);
 }
 EXPORT_SYMBOL(thaw_super);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3dcf2c1968e5..6980e709e94a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1406,9 +1406,10 @@ enum {
SB_FREEZE_FS = 3,   /* For internal FS use (e.g. to stop
 * internal threads if needed) */
SB_FREEZE_COMPLETE = 4, /* ->freeze_fs finished successfully */
+   SB_FREEZE_COMPLETE_AUTO = 5,/* same but initiated automatically */
 };
 
-#define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE - 1)
+#define SB_FREEZE_LEVELS (SB_FREEZE_COMPLETE_AUTO - 2)
 
 struct sb_writers {
int frozen; /* Is sb

[RFC v2 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing

2021-04-16 Thread Luis Chamberlain

Add support to automatically handle freezing and thawing filesystems
during the kernel's suspend/resume cycle.

This is needed so that we properly really stop IO in flight without
races after userspace has been frozen. Without this we rely on
kthread freezing and its semantics are loose and error prone.
For instance, even though a kthread may use try_to_freeze() and end
up being frozen we have no way of being sure that everything that
has been spawned asynchronously from it (such as timers) have also
been stopped as well.

A long term advantage of also adding filesystem freeze / thawing
supporting during suspend / hibernation is that long term we may
be able to eventually drop the kernel's thread freezing completely
as it was originally added to stop disk IO in flight as we hibernate
or suspend.

This also removes all the superflous freezer calls on all filesystems
as they are no longer needed as the VFS now performs filesystem
freezing/thaw if the filesystem has support for it. The filesystem
therefore is in charge of properly dealing with quiescing of the
filesystem through its callbacks.

This also implies that many kthread users exist which have been
adding freezer semantics onto its kthreads without need. These also
will need to be reviewed later.

This is based on prior work originally by Rafael Wysocki and later by
Jiri Kosina.

The following Coccinelle rule was used as to remove the now superflous
freezer calls:

spatch --sp-file fs-freeze-cleanup.cocci --in-place fs/

@ has_freeze_fs @
identifier super_ops;
expression freeze_op;
@@

struct super_operations super_ops = {
.freeze_fs = freeze_op,
};

@ remove_set_freezable depends on has_freeze_fs @
expression time;
statement S, S2;
expression task, current;
@@

(
-   set_freezable();
|
-   if (try_to_freeze())
-   continue;
|
-   try_to_freeze();
|
-   freezable_schedule();
+   schedule();
|
-   freezable_schedule_timeout(time);
+   schedule_timeout(time);
|
-   if (freezing(task)) { S }
|
-   if (freezing(task)) { S }
-   else
{ S2 }
|
-   freezing(current)
)

@ remove_wq_freezable @
expression WQ_E, WQ_ARG1, WQ_ARG2, WQ_ARG3, WQ_ARG4;
identifier fs_wq_fn;
@@

(
WQ_E = alloc_workqueue(WQ_ARG1,
-  WQ_ARG2 | WQ_FREEZABLE,
+  WQ_ARG2,
   ...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
-  WQ_ARG2 | WQ_FREEZABLE | WQ_ARG3,
+  WQ_ARG2 | WQ_ARG3,
   ...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
-  WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE,
+  WQ_ARG2 | WQ_ARG3,
   ...);
|
WQ_E = alloc_workqueue(WQ_ARG1,
-  WQ_ARG2 | WQ_ARG3 | WQ_FREEZABLE | WQ_ARG4,
+  WQ_ARG2 | WQ_ARG3 | WQ_ARG4,
   ...);
|
WQ_E =
-   WQ_ARG1 | WQ_FREEZABLE
+   WQ_ARG1
|
WQ_E =
-   WQ_ARG1 | WQ_FREEZABLE | WQ_ARG3
+   WQ_ARG1 | WQ_ARG3
|
fs_wq_fn(
-   WQ_FREEZABLE | WQ_ARG2 | WQ_ARG3
+   WQ_ARG2 | WQ_ARG3
)
|
fs_wq_fn(
-   WQ_FREEZABLE | WQ_ARG2
+   WQ_ARG2
)
|
fs_wq_fn(
-   WQ_FREEZABLE
+   0
)
)

Signed-off-by: Luis Chamberlain 
---
 fs/btrfs/disk-io.c |  4 +-
 fs/btrfs/scrub.c   |  2 +-
 fs/cifs/cifsfs.c   | 10 ++---
 fs/cifs/dfs_cache.c|  2 +-
 fs/ext4/super.c|  2 -
 fs/f2fs/gc.c   |  7 +---
 fs/f2fs/segment.c  |  6 +--
 fs/gfs2/glock.c|  6 +--
 fs/gfs2/main.c |  4 +-
 fs/jfs/jfs_logmgr.c| 11 ++
 fs/jfs/jfs_txnmgr.c| 31 +--
 fs/nilfs2/segment.c| 48 ++-
 fs/super.c | 88 ++
 fs/xfs/xfs_log.c   |  3 +-
 fs/xfs/xfs_mru_cache.c |  2 +-
 fs/xfs/xfs_pwork.c |  2 +-
 fs/xfs/xfs_super.c | 14 +++
 fs/xfs/xfs_trans_ail.c |  7 +---
 include/linux/fs.h | 13 +++
 kernel/power/process.c | 15 ++-
 20 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9fc2ec72327f..2c718f1eaae3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2303,7 +2303,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info 
*fs_info,
struct btrfs_fs_devices *fs_devices)
 {
u32 max_active = fs_info->thread_pool_size;
-   unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
+   unsigned int flags = WQ_MEM_RECLAIM | WQ_UNBOUND;
 
fs_info->workers =
btrfs_alloc_workqueue(fs_info, "worker",
@@ -2355,7 +2355,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info 
*fs_info,

[RFC v2 5/6] fs: add iterate_supers_excl() and iterate_supers_reverse_excl()

2021-04-16 Thread Luis Chamberlain

There are use cases where we wish to traverse the superblock list
but also capture errors, and in which case we want to avoid having
our callers issue a lock themselves since we can do the locking for
the callers. Provide a iterate_supers_excl() which calls a function
with the write lock held. If an error occurs we capture it and
propagate it.

Likewise there are use cases where we wish to traverse the superblock
list but in reverse order. The new iterate_supers_reverse_excl() helpers
does this but also also captures any errors encountered.

Reviewed-by: Jan Kara 
Signed-off-by: Luis Chamberlain 
---
 fs/super.c | 91 ++
 include/linux/fs.h |  2 +
 2 files changed, 93 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 53106d4c7f56..2a6ef4ec2496 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -705,6 +705,97 @@ void iterate_supers(void (*f)(struct super_block *, void 
*), void *arg)
spin_unlock(_lock);
 }
 
+/**
+ * iterate_supers_excl - exclusively call func for all active superblocks
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument. Returns 0 unless an error
+ * occurred on calling the function on any superblock.
+ */
+int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
+{
+   struct super_block *sb, *p = NULL;
+   int error = 0;
+
+   spin_lock(_lock);
+   list_for_each_entry(sb, _blocks, s_list) {
+   if (hlist_unhashed(>s_instances))
+   continue;
+   sb->s_count++;
+   spin_unlock(_lock);
+
+   down_write(>s_umount);
+   if (sb->s_root && (sb->s_flags & SB_BORN)) {
+   error = f(sb, arg);
+   if (error) {
+   up_write(>s_umount);
+   spin_lock(_lock);
+   __put_super(sb);
+   break;
+   }
+   }
+   up_write(>s_umount);
+
+   spin_lock(_lock);
+   if (p)
+   __put_super(p);
+   p = sb;
+   }
+   if (p)
+   __put_super(p);
+   spin_unlock(_lock);
+
+   return error;
+}
+
+/**
+ * iterate_supers_reverse_excl - exclusively calls func in reverse order
+ * @f: function to call
+ * @arg: argument to pass to it
+ *
+ * Scans the superblock list and calls given function, passing it
+ * locked superblock and given argument, in reverse order, and holding
+ * the s_umount write lock. Returns if an error occurred.
+ */
+int iterate_supers_reverse_excl(int (*f)(struct super_block *, void *),
+void *arg)
+{
+   struct super_block *sb, *p = NULL;
+   int error = 0;
+
+   spin_lock(_lock);
+   list_for_each_entry_reverse(sb, _blocks, s_list) {
+   if (hlist_unhashed(>s_instances))
+   continue;
+   sb->s_count++;
+   spin_unlock(_lock);
+
+   down_write(>s_umount);
+   if (sb->s_root && (sb->s_flags & SB_BORN)) {
+   error = f(sb, arg);
+   if (error) {
+   up_write(>s_umount);
+   spin_lock(_lock);
+   __put_super(sb);
+   break;
+   }
+   }
+   up_write(>s_umount);
+
+   spin_lock(_lock);
+   if (p)
+   __put_super(p);
+   p = sb;
+   }
+   if (p)
+   __put_super(p);
+   spin_unlock(_lock);
+
+   return error;
+}
+
 /**
  * iterate_supers_type - call function for superblocks of given type
  * @type: fs type
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6980e709e94a..0f4d624f0f3f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3442,6 +3442,8 @@ extern struct super_block *get_active_super(struct 
block_device *bdev);
 extern void drop_super(struct super_block *sb);
 extern void drop_super_exclusive(struct super_block *sb);
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
+extern int iterate_supers_excl(int (*f)(struct super_block *, void *), void 
*arg);
+extern int iterate_supers_reverse_excl(int (*)(struct super_block *, void *), 
void *);
 extern void iterate_supers_type(struct file_system_type *,
void (*)(struct super_block *, void *), void *);
 
-- 
2.29.2

[RFC v2 1/6] fs: provide unlocked helper for freeze_super()

2021-04-16 Thread Luis Chamberlain

freeze_super() holds a write lock, however we wish to also enable
callers which already hold the write lock. To do this provide a helper
and make freeze_super() use it. This way, all that freeze_super() does
now is lock handling and active count management.

This change has no functional changes.

Suggested-by: Dave Chinner 
Reviewed-by: Jan Kara 
Signed-off-by: Luis Chamberlain 
---
 fs/super.c | 100 +
 1 file changed, 55 insertions(+), 45 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 11b7e7213fd1..e24d0849d935 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1624,59 +1624,20 @@ static void sb_freeze_unlock(struct super_block *sb)
percpu_up_write(sb->s_writers.rw_sem + level);
 }
 
-/**
- * freeze_super - lock the filesystem and force it into a consistent state
- * @sb: the super to lock
- *
- * Syncs the super to make sure the filesystem is consistent and calls the fs's
- * freeze_fs.  Subsequent calls to this without first thawing the fs will 
return
- * -EBUSY.
- *
- * During this function, sb->s_writers.frozen goes through these values:
- *
- * SB_UNFROZEN: File system is normal, all writes progress as usual.
- *
- * SB_FREEZE_WRITE: The file system is in the process of being frozen.  New
- * writes should be blocked, though page faults are still allowed. We wait for
- * all writes to complete and then proceed to the next stage.
- *
- * SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blocked
- * but internal fs threads can still modify the filesystem (although they
- * should not dirty new pages or inodes), writeback can run etc. After waiting
- * for all running page faults we sync the filesystem which will clean all
- * dirty pages and inodes (no new dirty pages or inodes can be created when
- * sync is running).
- *
- * SB_FREEZE_FS: The file system is frozen. Now all internal sources of fs
- * modification are blocked (e.g. XFS preallocation truncation on inode
- * reclaim). This is usually implemented by blocking new transactions for
- * filesystems that have them and need this additional guard. After all
- * internal writers are finished we call ->freeze_fs() to finish filesystem
- * freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is
- * mostly auxiliary for filesystems to verify they do not modify frozen fs.
- *
- * sb->s_writers.frozen is protected by sb->s_umount.
- */
-int freeze_super(struct super_block *sb)
+/* Caller takes lock and handles active count */
+static int freeze_locked_super(struct super_block *sb)
 {
int ret;
 
-   atomic_inc(>s_active);
-   down_write(>s_umount);
-   if (sb->s_writers.frozen != SB_UNFROZEN) {
-   deactivate_locked_super(sb);
+   if (sb->s_writers.frozen != SB_UNFROZEN)
return -EBUSY;
-   }
 
-   if (!(sb->s_flags & SB_BORN)) {
-   up_write(>s_umount);
+   if (!(sb->s_flags & SB_BORN))
return 0;   /* sic - it's "nothing to do" */
-   }
 
if (sb_rdonly(sb)) {
/* Nothing to do really... */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
-   up_write(>s_umount);
return 0;
}
 
@@ -1705,7 +1666,6 @@ int freeze_super(struct super_block *sb)
sb->s_writers.frozen = SB_UNFROZEN;
sb_freeze_unlock(sb);
wake_up(>s_writers.wait_unfrozen);
-   deactivate_locked_super(sb);
return ret;
}
}
@@ -1714,9 +1674,59 @@ int freeze_super(struct super_block *sb)
 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 */
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+   return 0;
+}
+
+/**
+ * freeze_super - lock the filesystem and force it into a consistent state
+ * @sb: the super to lock
+ *
+ * Syncs the super to make sure the filesystem is consistent and calls the fs's
+ * freeze_fs.  Subsequent calls to this without first thawing the fs will 
return
+ * -EBUSY.
+ *
+ * During this function, sb->s_writers.frozen goes through these values:
+ *
+ * SB_UNFROZEN: File system is normal, all writes progress as usual.
+ *
+ * SB_FREEZE_WRITE: The file system is in the process of being frozen.  New
+ * writes should be blocked, though page faults are still allowed. We wait for
+ * all writes to complete and then proceed to the next stage.
+ *
+ * SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blocked
+ * but internal fs threads can still modify the filesystem (although they
+ * should not dirty new pages or inodes), writeback can run etc. After waiting
+ * for all running page faults we sync the filesystem which will clean all
+ * dirty pages and inodes (no new dirty pages or inodes can be created when
+ * sync is running).
+ *
+ *

[RFC v2 3/6] fs: add a helper for thaw_super_locked() which does not unlock

2021-04-16 Thread Luis Chamberlain

The thaw_super_locked() expects the caller to hold the sb->s_umount
semaphore. It also handles the unlocking of the semaphore for you.
Allow for cases where the caller will do the unlocking of the semaphore.

Signed-off-by: Luis Chamberlain 
---
 fs/super.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 72b445a69a45..744b2399a272 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1730,14 +1730,13 @@ int freeze_super(struct super_block *sb)
 }
 EXPORT_SYMBOL(freeze_super);
 
-static int thaw_super_locked(struct super_block *sb)
+/* Caller deals with the sb->s_umount */
+static int __thaw_super_locked(struct super_block *sb)
 {
int error;
 
-   if (!sb_is_frozen(sb)) {
-   up_write(>s_umount);
+   if (!sb_is_frozen(sb))
return -EINVAL;
-   }
 
if (sb_rdonly(sb)) {
sb->s_writers.frozen = SB_UNFROZEN;
@@ -1752,7 +1751,6 @@ static int thaw_super_locked(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
lockdep_sb_freeze_release(sb);
-   up_write(>s_umount);
return error;
}
}
@@ -1761,10 +1759,25 @@ static int thaw_super_locked(struct super_block *sb)
sb_freeze_unlock(sb);
 out:
wake_up(>s_writers.wait_unfrozen);
-   deactivate_locked_super(sb);
return 0;
 }
 
+/* Handles unlocking of sb->s_umount for you */
+static int thaw_super_locked(struct super_block *sb)
+{
+   int error;
+
+   error = __thaw_super_locked(sb);
+   if (error) {
+   up_write(>s_umount);
+   return error;
+   }
+
+   deactivate_locked_super(sb);
+
+   return 0;
+ }
+
 /**
  * thaw_super -- unlock filesystem
  * @sb: the super to thaw
-- 
2.29.2

[RFC v2 2/6] fs: add frozen sb state helpers

2021-04-16 Thread Luis Chamberlain

The question of whether or not a superblock is frozen needs to be
augmented in the future to account for differences between a user
initiated freeze and a kernel initiated freeze done automatically
on behalf of the kernel.

Provide helpers so that these can be used instead so that we don't
have to expand checks later in these same call sites as we expand
the definition of a frozen superblock.

Signed-off-by: Luis Chamberlain 
---
 fs/ext4/ext4_jbd2.c |  2 +-
 fs/super.c  |  4 ++--
 fs/xfs/xfs_trans.c  |  3 +--
 include/linux/fs.h  | 34 ++
 4 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index be799040a415..efda50563feb 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -72,7 +72,7 @@ static int ext4_journal_check_start(struct super_block *sb)
 
if (sb_rdonly(sb))
return -EROFS;
-   WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
+   WARN_ON(sb_is_frozen(sb));
journal = EXT4_SB(sb)->s_journal;
/*
 * Special case here: if the journal has aborted behind our
diff --git a/fs/super.c b/fs/super.c
index e24d0849d935..72b445a69a45 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1629,7 +1629,7 @@ static int freeze_locked_super(struct super_block *sb)
 {
int ret;
 
-   if (sb->s_writers.frozen != SB_UNFROZEN)
+   if (!sb_is_unfrozen(sb))
return -EBUSY;
 
if (!(sb->s_flags & SB_BORN))
@@ -1734,7 +1734,7 @@ static int thaw_super_locked(struct super_block *sb)
 {
int error;
 
-   if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
+   if (!sb_is_frozen(sb)) {
up_write(>s_umount);
return -EINVAL;
}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index bcc978011869..b4669dd65c9e 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -272,8 +272,7 @@ xfs_trans_alloc(
 * Zero-reservation ("empty") transactions can't modify anything, so
 * they're allowed to run while we're frozen.
 */
-   WARN_ON(resp->tr_logres > 0 &&
-   mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
+   WARN_ON(resp->tr_logres > 0 && sb_is_frozen(mp->m_super));
ASSERT(!(flags & XFS_TRANS_RES_FDBLKS) ||
   xfs_sb_version_haslazysbcount(>m_sb));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c3c88fdb9b2a..3dcf2c1968e5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1885,6 +1885,40 @@ static inline bool sb_start_intwrite_trylock(struct 
super_block *sb)
return __sb_start_write_trylock(sb, SB_FREEZE_FS);
 }
 
+/**
+ * sb_is_frozen_by_user - is superblock frozen by a user call
+ * @sb: the super to check
+ *
+ * Returns true if the super freeze was initiated by userspace, for instance,
+ * an ioctl call.
+ */
+static inline bool sb_is_frozen_by_user(struct super_block *sb)
+{
+   return sb->s_writers.frozen == SB_FREEZE_COMPLETE;
+}
+
+/**
+ * sb_is_frozen - is superblock frozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is frozen.
+ */
+static inline bool sb_is_frozen(struct super_block *sb)
+{
+   return sb_is_frozen_by_user(sb);
+}
+
+/**
+ * sb_is_unfrozen - is superblock unfrozen
+ * @sb: the super to check
+ *
+ * Returns true if the super is unfrozen.
+ */
+static inline bool sb_is_unfrozen(struct super_block *sb)
+{
+   return sb->s_writers.frozen == SB_UNFROZEN;
+}
+
 bool inode_owner_or_capable(struct user_namespace *mnt_userns,
const struct inode *inode);
 
-- 
2.29.2

[RFC v2 0/6] vfs: provide automatic kernel freeze / resume

2021-04-16 Thread Luis Chamberlain

This picks up where I left off in 2017, I was inclined to respin this
up again due to a new issue Lukas reported which is easy to reproduce.
I posted a stop-gap patch for that issue [0] and a test case, however,
*this* is the work we want upstream to fix these sorts of issues.

As discussed long ago though at LSFMM, we have much work to do. However,
we can take small strides to get us closer. This is trying to take one
step. It addresses most of the comments and feedback from the last
series I posted, however it doesn't address the biggest two issues:

 o provide clean and clear semantics between userspace ioctls /
   automatic fs freeze, and freeze bdev. This also involves moving the
   counter stuff from bdev to the superblock. This is pending work.
 o The loopback hack which breaks the reverse ordering isn't addressed,
   perhaps just flagging it suffices for now?
 o The long term desirable DAG *is* plausable and I have an initial
   kernel graph implementation which I could use, but that may take
   longer to merge.

What this series *does* address based on the last series is:

  o Rebased onto linux-next tag next-20210415
  o Fixed RCU complaints. The issue was that I was adding new fs levels, and
this increated undesirably also the amount of rw semaphores, but we were
just using the new levels to distinguish *who* was triggering the suspend,
it was either inside the kernel and automatic, or triggered by userspace.
  o thaw_super_locked() was added but that unlocks the sb sb->s_umount,
our exclusive reverse iterate supers however will want to hold that
semaphore, so we provide a helper which lets the caller do the unlocking
for you, and make thaw_super_locked() a user of that.
  o WQ_FREEZABLE is now dealt with
  o I have folded the fs freezer removal stuff into the patch which adds
the automatic fs frezer / thaw work from the kernel as otherwise separting
this would create intermediate steps which would produce kernels which
can stall on suspend / resume.

[0] https://lkml.kernel.org/r/20210416235850.23690-1-mcg...@kernel.org
[1] https://lkml.kernel.org/r/20171129232356.28296-1-mcg...@kernel.org  
    

Luis Chamberlain (6):
  fs: provide unlocked helper for freeze_super()
  fs: add frozen sb state helpers
  fs: add a helper for thaw_super_locked() which does not unlock
  fs: distinguish between user initiated freeze and kernel initiated
freeze
  fs: add iterate_supers_excl() and iterate_supers_reverse_excl()
  fs: add automatic kernel fs freeze / thaw and remove kthread freezing

 fs/btrfs/disk-io.c |   4 +-
 fs/btrfs/scrub.c   |   2 +-
 fs/cifs/cifsfs.c   |  10 +-
 fs/cifs/dfs_cache.c|   2 +-
 fs/ext4/ext4_jbd2.c|   2 +-
 fs/ext4/super.c|   2 -
 fs/f2fs/gc.c   |   7 +-
 fs/f2fs/segment.c  |   6 +-
 fs/gfs2/glock.c|   6 +-
 fs/gfs2/main.c |   4 +-
 fs/jfs/jfs_logmgr.c|  11 +-
 fs/jfs/jfs_txnmgr.c|  31 ++--
 fs/nilfs2/segment.c|  48 +++---
 fs/super.c | 321 ++---
 fs/xfs/xfs_log.c   |   3 +-
 fs/xfs/xfs_mru_cache.c |   2 +-
 fs/xfs/xfs_pwork.c |   2 +-
 fs/xfs/xfs_super.c |  14 +-
 fs/xfs/xfs_trans.c |   3 +-
 fs/xfs/xfs_trans_ail.c |   7 +-
 include/linux/fs.h |  64 +++-
 kernel/power/process.c |  15 +-
 22 files changed, 405 insertions(+), 161 deletions(-)

-- 
2.29.2

[PATCH 2/2] fs/kernel_read_file: use usermodehelper_read_trylock() as a stop gap

2021-04-16 Thread Luis Chamberlain

The VFS lacks support to do automatic freeze / thaw of filesystems
on the suspend / resume cycle. This can cause some issues, namely
stalls when we have reads/writes during the suspend / resume cycle.

Although for module loading / kexec the probability of this happening
is extremely low, maybe even impossible, its a known real issue with
the request_firmare() API which it does direct fs read. For this reason
only be chatty about the issue on the call used by the firmware API.

Lukas Middendorf has reported an easy situation to reproduce, which can
be caused by questionably buggy drivers which call the request_firmware()
API on resume.

All you have to do is do the suspend / resume cycle using a driver
which will call request_firmware() on resume on a fresh XFS or
btrfs filesystem which has random files but the target file present:

for n in {1..1000}; do
dd if=/dev/urandom of=/lib/firmware/file$( printf %03d "$n" ).bin bs=1 
count=$((RANDOM + 1024 ))
done

You can reproduce this either with a buggy driver or by using the
test_firmware driver with the new hooks added to test this case.

No other request_firmware() calls can be triggered for this hang to work.
The hang occurs because:

request_firmware() -->
kernel_read_file_from_path_initns() -->
file_open_root() -->
  btrfs_lookup() --> ... -->
  btrfs_search_slot() --> read_block_for_search() -->
--> read_tree_block() --> btree_read_extent_buffer_pages() -->
--> submit_one_bio() --> btrfs_submit_metadata_bio() -->
--> btrfsic_submit_bio() --> submit_bio()
--> this completes and then
--> wait_on_page_locked() on the first page
--> never returns

The endless wait for the read which the piece of hardware never got
stalls resume as sync calls are all asynchronous.

The VFS fs freeze work fixes this issue, however it requires a bit
more work, which may take a while to land upstream, and so for now
this provides a simple stop-gap solution.

We can remove this stop-gap solution once the kernel gets VFS
fs freeze / thaw support.

Cc: raf...@kernel.org
Cc: j...@suse.cz
Cc: bvanass...@acm.org
Cc: ker...@tuxforce.de
Cc: mche...@kernel.org
Cc: keesc...@chromium.org
Reported-by: Lukas Middendorf 
Signed-off-by: Luis Chamberlain 
---
 fs/kernel_read_file.c | 37 +++--
 kernel/kexec_file.c   |  9 -
 kernel/module.c   |  8 +++-
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index 90d255fbdd9b..f82d8534bf39 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * kernel_read_file() - read file contents into a kernel buffer
@@ -134,12 +135,20 @@ int kernel_read_file_from_path(const char *path, loff_t 
offset, void **buf,
if (!path || !*path)
return -EINVAL;
 
+   ret = usermodehelper_read_trylock();
+   if (ret)
+   return ret;
+
file = filp_open(path, O_RDONLY, 0);
-   if (IS_ERR(file))
+   if (IS_ERR(file)) {
+   usermodehelper_read_unlock();
return PTR_ERR(file);
+   }
 
ret = kernel_read_file(file, offset, buf, buf_size, file_size, id);
fput(file);
+
+   usermodehelper_read_unlock();
return ret;
 }
 EXPORT_SYMBOL_GPL(kernel_read_file_from_path);
@@ -160,13 +169,37 @@ int kernel_read_file_from_path_initns(const char *path, 
loff_t offset,
get_fs_root(init_task.fs, );
task_unlock(_task);
 
+   /*
+* Note: we may be incorrectly called on a driver's resume callback.
+*
+* The kernel's power management may be busy bringing us to suspend
+* or trying to get us back to resume. If we try to do a direct write
+* during this time a block driver may never get that request, and the
+* filesystem can wait forever. This requires proper VFS work, which
+* is not yet ready.
+*
+* Likewise busy trying here is not possible as well as we'd be holding
+* up the kernel's pm resume, and waiting will not allow use to thaw
+* the filesystem, we'd just wait forever. Best we can do is
+* communuicate the problem so that drivers use the firwmare cache or
+* implement their own prior to resume.
+*/
+   ret = usermodehelper_read_trylock();
+   if (ret) {
+   pr_warn_once("Trying direct fs read while not allowed");
+   return ret;
+   }
+
file = file_open_root(root.dentry, root.mnt, path, O_RDONLY, 0);
path_put();
-   if (IS_ERR(file))
+   if (IS_ERR(file)) {
+   usermodehelper_read_unlock();
return PTR_ERR(file);
+   }
 
ret = kernel_read_file(file, offset, buf, buf_size, file_size, id);
fput(file);
+

[PATCH 0/2] fs: provide a stop gap fix for buggy resume firmware API calls

2021-04-16 Thread Luis Chamberlain

Lukas has reported an issue [0] with a media driver which causes stall on
resume. At first his suspicion was that this issue was due to a bug in
btrfs. I managed to build a custom driver to reproduce the issue and
confirmed it was not a bug in btrfs. The issue is reproducible in XFS
as well. This issue is a demonstration of how broken the filesystem
suspend / resume cycle is and how easy it can be to trigger an issue.

By only doing reads with the firmware API used incorrectly, a simple
suspend / resume cycle can stall a system. The stall happens since
the hardware never gets read request issued by the filesystem as it
was already suspended. The fs waits forever. The stall also happens
because resume calls are synchronous and if one does not complete
we'll wait forever.

My new unposted VFS series for the fs freezer / resume work fixes this,
however this series will require a bit more discussion before this lands
upstream. And so this series provides a test case for the issue and an
intermediate stop-gap patch which resolves the issue for now. We can
remove this once the VFS freeze work lands upstream.

[0] https://lkml.kernel.org/r/c79e24a5-f808-d1f0-1f09-ee6f135d9...@tuxforce.de

Luis Chamberlain (2):
  test_firmware: add suspend support to test buggy drivers
  fs/kernel_read_file: use usermodehelper_read_trylock() as a stop gap

 fs/kernel_read_file.c |  37 -
 kernel/kexec_file.c   |   9 +-
 kernel/module.c   |   8 +-
 lib/test_firmware.c   | 136 --
 tools/testing/selftests/firmware/fw_lib.sh|   8 +-
 .../selftests/firmware/fw_test_resume.sh  |  80 +++
 6 files changed, 261 insertions(+), 17 deletions(-)
 create mode 100755 tools/testing/selftests/firmware/fw_test_resume.sh

-- 
2.29.2

[PATCH 1/2] test_firmware: add suspend support to test buggy drivers

2021-04-16 Thread Luis Chamberlain

Lukas Middendorf reported a situation where a driver's
request_firmware() call on resume caused a stall. Upon
inspection the issue was that the driver in question was
calling request_firmware() only on resume, and since we
currently do not have a generic kernel VFS freeze / thaw
solution in place we are allowing races for filesystems
to race against the disappearance of a block device, and
this is presently an issue which can lead to a stall.

It is difficult to reproduce this unless you have hardware
which mimics this setup. So to test this setup, let's just
implement support for doing these wacky things. This lets
us test that corner case easily as follows.

echo N > /sys/module/printk/parameters/console_suspend
modprobe test_firmware
echo 1 > /sys/devices/virtual/misc/test_firmware/config_enable_resume_test
systemctl suspend

Then call virsh dompmwakeup guest-id, on the guest and you
can reproduce the issue easily. The issue is reprodicible
with xfs and btrfs, and using a new partition for /lib/firmware
with a files created first as follows:

for n in {1..1000}; do
  dd if=/dev/urandom of=/lib/firmware/file$( printf %03d "$n" ).bin bs=1 
count=$((RANDOM + 1024 ))
done

Cc: raf...@kernel.org
Cc: j...@suse.cz
Cc: bvanass...@acm.org
Cc: ker...@tuxforce.de
Cc: mche...@kernel.org
Cc: keesc...@chromium.org
Signed-off-by: Luis Chamberlain 
---
 lib/test_firmware.c   | 136 --
 tools/testing/selftests/firmware/fw_lib.sh|   8 +-
 .../selftests/firmware/fw_test_resume.sh  |  80 +++
 3 files changed, 211 insertions(+), 13 deletions(-)
 create mode 100755 tools/testing/selftests/firmware/fw_test_resume.sh

diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index b6fe89add9fe..47ba6f4380ab 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -35,6 +36,8 @@ MODULE_IMPORT_NS(TEST_FIRMWARE);
 static DEFINE_MUTEX(test_fw_mutex);
 static const struct firmware *test_firmware;
 
+static struct platform_device *pdev;
+
 struct test_batched_req {
u8 idx;
int rc;
@@ -58,6 +61,9 @@ struct test_batched_req {
  * @sync_direct: when the sync trigger is used if this is true
  * request_firmware_direct() will be used instead.
  * @send_uevent: whether or not to send a uevent for async requests
+ * @enable_resume_test: if @senable_resume is true this will enable a test to
+ * issue a request_firmware() upon resume. This is useful to test resume
+ * after suspend filesystem races.
  * @num_requests: number of requests to try per test case. This is trigger
  * specific.
  * @reqs: stores all requests information
@@ -99,6 +105,7 @@ struct test_config {
bool partial;
bool sync_direct;
bool send_uevent;
+   bool enable_resume_test;
u8 num_requests;
u8 read_fw_idx;
 
@@ -195,6 +202,7 @@ static int __test_firmware_config_init(void)
test_fw_config->file_offset = 0;
test_fw_config->partial = false;
test_fw_config->sync_direct = false;
+   test_fw_config->enable_resume_test = false;
test_fw_config->req_firmware = request_firmware;
test_fw_config->test_result = 0;
test_fw_config->reqs = NULL;
@@ -275,6 +283,9 @@ static ssize_t config_show(struct device *dev,
len += scnprintf(buf + len, PAGE_SIZE - len,
"sync_direct:\t\t%s\n",
test_fw_config->sync_direct ? "true" : "false");
+   len += scnprintf(buf+len, PAGE_SIZE - len,
+   "enable_resume_test:\t\t%s\n",
+   test_fw_config->enable_resume_test ? "true" : "false");
len += scnprintf(buf + len, PAGE_SIZE - len,
"read_fw_idx:\t%u\n", test_fw_config->read_fw_idx);
 
@@ -538,6 +549,22 @@ static ssize_t config_sync_direct_show(struct device *dev,
 }
 static DEVICE_ATTR_RW(config_sync_direct);
 
+static ssize_t config_enable_resume_test_store(struct device *dev,
+  struct device_attribute *attr,
+  const char *buf, size_t count)
+{
+   return test_dev_config_update_bool(buf, count,
+  _fw_config->enable_resume_test);
+}
+
+static ssize_t config_enable_resume_test_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   return test_dev_config_show_bool(buf, 
test_fw_config->enable_resume_test);
+}
+static DEVICE_ATTR_RW(config_enable_resume_test);
+
 static ssize_t config_send_uevent_store(struct device *dev,
struct device_attribute *attr,
c

Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()

2021-04-15 Thread Luis Chamberlain

On Thu, Apr 15, 2021 at 02:17:58PM -0400, Josef Bacik wrote:
> There's a lot of larger things that need to
> be addressed in general to support the volume approach inside file systems
> that is going to require a lot of work inside of VFS.  If you feel like
> tackling that work and then wiring up btrfs by all means have at it, but I'm
> not seeing a urgent need to address this.  Thanks,

That's precisely what I what I want to hear me about. Things like this.
Would btrfs be the ony user of volumes inside filesystem? Jeff had
mentioned before this could also allow namespaces per volumes, and this
might be a desirable feature.

What else?

 Luis

Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()

2021-04-15 Thread Luis Chamberlain

On Wed, Aug 23, 2017 at 3:31 PM Jeff Mahoney  wrote:
>
> On 8/15/14 5:29 AM, Al Viro wrote:
> > On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote:
> >
> >> Christoph had noted that this seemed associated to the problem
> >> that the btrfs uses different assignments for st_dev than s_dev,
> >> but much as I'd like to see that changed based on discussions so
> >> far its unclear if this is going to be possible unless strong
> >> commitment is reached.
>
> Resurrecting a dead thread since we've been carrying this patch anyway
> since then.
>
> > Explain, please.  Whose commitment and commitment to what, exactly?
> > Having different ->st_dev values for different files on the same
> > fs is a bloody bad idea; why does btrfs do that at all?  If nothing else,
> > it breaks the usual "are those two files on the same fs?" tests...
>
> It's because btrfs snapshots would have inode number collisions.
> Changing the inode numbers for snapshots would negate a big benefit of
> btrfs snapshots: the quick creation and lightweight on-disk
> representation due to metadata sharing.
>
> The thing is that ustat() used to work.  Your commit 0ee5dc676a5f8
> (btrfs: kill magical embedded struct superblock) had a regression:
> Since it replaced the superblock with a simple dev_t, it rendered the
> device no longer discoverable by user_get_super.  We need a list_head to
> attach for searching.
>
> There's an argument that this is hacky.  It's valid.  The only other
> feedback I've heard is to use a real superblock for subvolumes to do
> this instead.  That doesn't work either, due to things like freeze/thaw
> and inode writeback.  Ultimately, what we need is a single file system
> with multiple namespaces.  Years ago we just needed different inode
> namespaces, but as people have started adopting btrfs for containers, we
> need more than that.  I've heard requests for per-subvolume security
> contexts.  I'd imagine user namespaces are on someone's wish list.  A
> working df can be done with ->d_automount, but the way btrfs handles
> having a "canonical" subvolume location has always been a way to avoid
> directory loops.  I'd like to just automount subvolumes everywhere
> they're referenced.  One solution, for which I have no code yet, is to
> have something like a superblock-light that we can hang things like a
> security context, a user namespace, and an anonymous dev.  Most file
> systems would have just one.  Btrfs would have one per subvolume.
>
> That's a big project with a bunch of discussion.

4 years have gone by and this patch is still being carried around for
btrfs. Other than resolving this ustat() issue for btrfs are there new
reasons to support this effort done to be done properly? Are there
other filesystems that would benefit? I'd like to get an idea of the
stakeholder here before considering taking this on or not.

 Luis

Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback_sysfs

2021-04-14 Thread Luis Chamberlain

Shuah, a question for you toward the end here.

On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
> This use-after-free happens when a fw_priv object has been freed but
> hasn't been removed from the pending list (pending_fw_head). The next
> time fw_load_sysfs_fallback tries to insert into the list, it ends up
> accessing the pending_list member of the previoiusly freed fw_priv.
> 
> The root cause here is that all code paths that abort the fw load
> don't delete it from the pending list. For example:
> 
>   _request_firmware()
> -> fw_abort_batch_reqs()
> -> fw_state_aborted()
> 
> To fix this, delete the fw_priv from the list in __fw_set_state() if
> the new state is DONE or ABORTED. This way, all aborts will remove
> the fw_priv from the list. Accordingly, remove calls to list_del_init
> that were being made before calling fw_state_(aborted|done).
> 
> Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
> list if it is already aborted. Instead, just jump out and return early.
> 
> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with 
> fw_load_sysfs_fallback")
> Reported-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com
> Tested-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com
> Signed-off-by: Anirudh Rayabharam 
> ---
> 
> Changes in v3:
> Modified the patch to incorporate suggestions by Luis Chamberlain in
> order to fix the root cause instead of applying a "band-aid" kind of
> fix.
> https://lore.kernel.org/lkml/20210403013143.gv4...@42.do-not-panic.com/
> 
> Changes in v2:
> 1. Fixed 1 error and 1 warning (in the commit message) reported by
> checkpatch.pl. The error was regarding the format for referring to
> another commit "commit  ("oneline")". The warning was for line
> longer than 75 chars. 
> 
> ---
>  drivers/base/firmware_loader/fallback.c | 8 ++--
>  drivers/base/firmware_loader/firmware.h | 6 +-
>  drivers/base/firmware_loader/main.c | 2 ++
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/firmware_loader/fallback.c 
> b/drivers/base/firmware_loader/fallback.c
> index 91899d185e31..73581b6998b4 100644
> --- a/drivers/base/firmware_loader/fallback.c
> +++ b/drivers/base/firmware_loader/fallback.c
> @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
>   if (fw_sysfs_done(fw_priv))
>   return;
>  
> - list_del_init(_priv->pending_list);
>   fw_state_aborted(fw_priv);
>  }
>  
> @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct device *dev,
>* Same logic as fw_load_abort, only the DONE bit
>* is ignored and we set ABORT only on failure.
>*/
> - list_del_init(_priv->pending_list);
>   if (rc) {
>   fw_state_aborted(fw_priv);
>   written = rc;
> @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct fw_sysfs 
> *fw_sysfs, long timeout)
>   }
>  
>   mutex_lock(_lock);
> + if (fw_state_is_aborted(fw_priv)) {
> + mutex_unlock(_lock);
> + retval = -EAGAIN;
> + goto out;
> + }

Thanks for the quick follow up!

This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
signal abort on fallback mechanism") which I had mentioned in my follow
up email you posted a link to. It would regress it since the condition
is just being met earlier and you nullify the effort. So essentially
on Android you would make not being able to detect signal handlers
like the SIGCHLD signal sent to init, if init was the same process
dealing with the sysfs fallback firmware upload.

The way I dealt with this in my patch was I decided to return -EINTR
in the earlier case in the hunk you added, instead of -EAGAIN. In
addition to this, later on fw_load_sysfs_fallback() when
fw_sysfs_wait_timeout() is used that would also deal with checking
for error codes on wait, and only then check if it was a signal
that cancelled things (the check for -ERESTARTSYS). We therefore
only send to userspace -EAGAIN when the wait really did hit the
timeout.

But also note that my change added a check for
fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
as that was a recently intended goal.

In either case I documented well *why* we do these error checks
before sending a code to userspace on fw_sysfs_wait_timeout() since
otherwise it would be easy to regress that code, so please also
document that as I did.

I'll re-iterate again also:

Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
_request_firmware_load(

Re: [PATCH v2] firmware_loader: fix use-after-free in firmware_fallback_sysfs

2021-04-13 Thread Luis Chamberlain

On Tue, Apr 13, 2021 at 04:12:42PM +0530, Anirudh Rayabharam wrote:
> The use-after-free happens when a fw_priv object has been freed but
> hasn't been removed from the pending list (pending_fw_head). The next
> time fw_load_sysfs_fallback tries to insert into the list, it ends up
> accessing the pending_list member of the previoiusly freed fw_priv.
> 
> In commit bcfbd3523f3c ("firmware: fix a double abort case with
> fw_load_sysfs_fallback"), fw_load_abort() is skipped if
> fw_sysfs_wait_timeout() returns -ENOENT. This causes the fw_priv to
> not be removed from the pending list.
> 
> To fix this, delete the fw_priv from the pending list when retval
> is -ENOENT instead of skipping the entire block.
> 
> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with 
> fw_load_sysfs_fallback")
> Reported-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com
> Tested-by: syzbot+de271708674e20930...@syzkaller.appspotmail.com
> Signed-off-by: Anirudh Rayabharam 

Thanks for your patch Anirudh, but please also see this reply to the
issue:

http://lkml.kernel.org/r/20210403013143.gv4...@42.do-not-panic.com

The way you patched the issue is just a band-aid, meaning we keep on
moving the issue further and it seems that's just the wrong approach.

Can you try the patch in that thread, to verify if the UAF goes away?

Re: [PATCH v2 1/1] kernel.h: Split out panic and oops helpers

2021-04-09 Thread Luis Chamberlain

On Fri, Apr 09, 2021 at 01:02:50PM +0300, Andy Shevchenko wrote:
> kernel.h is being used as a dump for all kinds of stuff for a long time.
> Here is the attempt to start cleaning it up by splitting out panic and
> oops helpers.
> 
> There are several purposes of doing this:
> - dropping dependency in bug.h
> - dropping a loop by moving out panic_notifier.h
> - unload kernel.h from something which has its own domain
> 
> At the same time convert users tree-wide to use new headers, although
> for the time being include new header back to kernel.h to avoid twisted
> indirected includes for existing users.
> 
> Signed-off-by: Andy Shevchenko 
> Reviewed-by: Bjorn Andersson 
> Acked-by: Mike Rapoport 
> Acked-by: Corey Minyard 
> Acked-by: Christian Brauner 
> Acked-by: Arnd Bergmann 
> Acked-by: Kees Cook 
> Acked-by: Wei Liu 
> Acked-by: Rasmus Villemoes 
> Signed-off-by: Andrew Morton 

Acked-by: Luis Chamberlain 

  Luis

Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate

2021-04-08 Thread Luis Chamberlain

On Thu, Apr 08, 2021 at 10:55:53AM +0200, Greg KH wrote:
> So to add crazy complexity to the kernel, 

I agree that this can be tricky. However, driver developers are going to
open code this either way. The problem with that as well, and one of my
own reasons for striving for at least *trying* for a generic solution,
was that I am aware that driver developers may be trying a busy solution
rather than the try method. The busy approach means you could also end
up in a situation where userspace can prevent full undoing / removal
of a driver. The try method is best effort in the sense that if the
driver is going it won't be allowed.

So a sensible consideration I think is we at least choose one of these:

 a) We allow driver developers to open code it on drivers which need it on
each and every single sysfs knob on the driver where its needed, and
accept folks might do it wrong

 b) Provide a macro which is opt-in, ie, not doing it for all
attributes, but perhaps one which the driver author *is* aware to
try / put of the driver method.

 c) Document this race and other races / failings so that driver
authors who do care about module removal are aware and know what
to do.

In this thread two races were exposed on syfs:

  * deadlock when a sysfs attribute uses a lock which is also used on
module __exit

  * possible race against the device's private data, and this is type
specific. I think many people probably missed the last hunks of my
proposed patch which added dev_type_get() which were not part of the
deadlock fix. At least I showed how attributes for all block devices
have an exposed race, which can crash if removal of a block device
with del_gendisk() happens while a sysfs attribute is about to be
used.

I don't think either race is trivial for a driver developer to assume a
solution for. Most focus on this thread was about the first race, the
seconod however is also very real, and likely can cause more issues on
rolling backs on error codes unrelated to rmmod...

> for an operation that can only
> be triggered manually by a root user, is not worth it in my opinion, as
> the maintainer of that code the complexity was asked to be made to.

Three things:

1) Many driver maintainers *do* care that rmmod works well. To the point
that if it does not, they feel ashamed. Reason for this is that in some
subsystems this *is* a typical test case. So although rmmod may be
a vodoo thing for core, many driver developers do want this to work
well.

As someone who also works on many test cases to expose odd issues in the
kernel unrelated to module removal, I can also say that module removal
does also expose other possible races which would otherwise be difficult
to trigger. So it is also a helfup aid as a developer working on
differen areas of the kernel.

2) Folks have pointed out that this is not just about rmmod, rolling
back removal of sysfs attributes due to error code paths is another
real scenario to consider. I don't think we have rules to tell userspace
to not muck with sysfs files after they are exposed. In fact those
uevents we send to userspace allow userspace to know exactly when to
muck with them.

3) Sadly, some sysfs attributes also purposely do things which *also*
mimic what is typically done on module removal, such as removal of an
interface, or block device. That was the case with zram, it allows
remvoal / reset of a device... Yes these are odd things to do, but we
allow for it. And sysfs users *do* need to be aware of these corner
cases if they want to support them.

There **may** be some severity to some of the issues mentioned above, to
allow really bad things to be done in userspace even without module
removal... but I didn't have time yet to expose a pattern with coccinelle
yet to see how commonplace some of these issue are. I was focusing at
first more for a generic solution if possible, as I thought that would
be better first evaluated, and to let others slowly become aware of the
issue.

  Luis

Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-07 Thread Luis Chamberlain

On Wed, Apr 07, 2021 at 05:59:19PM +0300, Andy Shevchenko wrote:
> On Wed, Apr 7, 2021 at 5:30 PM Luis Chamberlain  wrote:
> > On Wed, Apr 07, 2021 at 10:33:44AM +0300, Andy Shevchenko wrote:
> > > On Wed, Apr 7, 2021 at 10:25 AM Luis Chamberlain  
> > > wrote:
> > > > On Tue, Apr 06, 2021 at 04:31:58PM +0300, Andy Shevchenko wrote:
> 
> ...
> 
> > > > Why is it worth it to add another file just for this?
> > >
> > > The main point is to break tons of loops that prevent having clean
> > > headers anymore.
> > >
> > > In this case, see bug.h, which is very important in this sense.
> >
> > OK based on the commit log this was not clear, it seemed more of moving
> > panic stuff to its own file, so just cleanup.
> 
> Sorry for that. it should have mentioned the kernel folder instead of
> lib. But I think it won't clarify the above.
> 
> In any case there are several purposes in this case
>  - dropping dependency in bug.h
>  - dropping a loop by moving out panic_notifier.h
>  - unload kernel.h from something which has its own domain
> 
> I think that you are referring to the commit message describing 3rd
> one, but not 1st and 2nd.

Right!

> I will amend this for the future splits, thanks!

Don't get me wrong, I love the motivation behind just the 3rd purpose,
however I figured there might be something more when I saw panic_notifier.h.
It was just not clear.

But awesome stuff!

  Luis

1 2 3 4 5 6 >

1 - 100 of 583 matches

Mail list logo