Re: [PATCH v3] lib: Convert test_printf.c to KUnit
On 03/11/2020 17.11, Petr Mladek wrote: > On Tue 2020-11-03 12:52:23, Greg KH wrote: >> On Tue, Nov 03, 2020 at 01:33:53PM +0200, Andy Shevchenko wrote: >>> On Tue, Nov 03, 2020 at 04:40:49PM +0530, Arpitha Raghunandan wrote: Convert test lib/test_printf.c to KUnit. More information about KUnit can be found at: https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html. KUnit provides a common framework for unit tests in the kernel. KUnit and kselftest are standardizing around KTAP, converting this test to KUnit makes this test output in KTAP which we are trying to make the standard test result format for the kernel. More about the KTAP format can be found at: https://lore.kernel.org/linux-kselftest/cy4pr13mb1175b804e31e502221bc8163fd...@cy4pr13mb1175.namprd13.prod.outlook.com/. I ran both the original and converted tests as is to produce the output for success of the test in the two cases. I also ran these tests with a small modification to show the difference in the output for failure of the test in both cases. The modification I made is: - test("127.000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, &sa.sin_addr); + test("127-000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, &sa.sin_addr); Original test success: [0.540860] test_printf: loaded. [0.540863] test_printf: random seed = 0x5c46c33837bc0619 [0.541022] test_printf: all 388 tests passed Original test failure: [0.537980] test_printf: loaded. [0.537983] test_printf: random seed = 0x1bc1efd881954afb [0.538029] test_printf: vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' [0.538030] test_printf: kvasprintf(..., "%pi4|%pI4", ...) returned '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' [0.538124] test_printf: failed 2 out of 388 tests [0.538125] test_printf: random seed used was 0x1bc1efd881954afb Converted test success: # Subtest: printf 1..25 ok 1 - test_basic ok 2 - test_number ok 3 - test_string ok 4 - plain ok 5 - null_pointer ok 6 - error_pointer ok 7 - invalid_pointer ok 8 - symbol_ptr ok 9 - kernel_ptr ok 10 - struct_resource ok 11 - addr ok 12 - escaped_str ok 13 - hex_string ok 14 - mac ok 15 - ip ok 16 - uuid ok 17 - dentry ok 18 - struct_va_format ok 19 - time_and_date ok 20 - struct_clk ok 21 - bitmap ok 22 - netdev_features ok 23 - flags ok 24 - errptr ok 25 - fwnode_pointer ok 1 - printf Converted test failure: # Subtest: printf 1..25 ok 1 - test_basic ok 2 - test_number ok 3 - test_string ok 4 - plain ok 5 - null_pointer ok 6 - error_pointer ok 7 - invalid_pointer ok 8 - symbol_ptr ok 9 - kernel_ptr ok 10 - struct_resource ok 11 - addr ok 12 - escaped_str ok 13 - hex_string ok 14 - mac # ip: EXPECTATION FAILED at lib/printf_kunit.c:82 vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' # ip: EXPECTATION FAILED at lib/printf_kunit.c:124 kvasprintf(..., "%pi4|%pI4", ...) returned '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' not ok 15 - ip ok 16 - uuid ok 17 - dentry ok 18 - struct_va_format ok 19 - time_and_date ok 20 - struct_clk ok 21 - bitmap ok 22 - netdev_features ok 23 - flags ok 24 - errptr ok 25 - fwnode_pointer not ok 1 - printf >>> >>> Better, indeed. >>> >>> But can be this improved to have a cumulative statistics, like showing only >>> number of total, succeeded, failed with details of the latter ones? >> >> Is that the proper test output format? We have a standard... Actually more like xkcd#927. > > What is the standard, please? > > The original module produced: > > [ 48.577739] test_printf: loaded. > [ 48.578046] test_printf: all 388 tests passed > > It comes from KSTM_MODULE_LOADERS() macro that has been created > by the commit eebf4dd452377921e3a26 ("kselftest: Add test module > framework header"). That's a bit of a simplification. That code was clearly lifted directly from the original test_printf.c code (707cc7280f452a162c52bc240eae62568b9753c2). test_bitmap.c cloned test_printf.c (including a "Test cases for printf facility" comment...). Later both were converted to use the KSTM header. As the one coming up with that originally, I certainly endorse its use as a standard way of producing a final, free-
Re: [EXT] Re: [v2 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 02/11/2020 22.22, Leo Li wrote: >>> >>> Where did you get this information that the register on LS1043 and >>> LS1046 is bit reversed? I cannot find such information in the RM. >>> And does this mean all other SCFG registers are also bit reversed? If >>> this is some information that is not covered by the RM, we probably >>> should clarify it in the code and the commit message. >> Hi Leo, >> >> I directly use the same logic to write the bit(field IRQ0~11INTP) of the >> register SCFG_INTPCR in LS1043A and LS1046A. >> Such as, >> if I want to control the polarity of IRQ0(field IRQ0INTP, IRQ0 is active >> low) of >> LS1043A/LS1046A, then I just need write a value 1 << (31 - 0) to it. >> The logic depends on register's definition in LS1043A/LS1046A's RM. > > Ok. The SCFG_SCFGREVCR seems to be a one-off fixup only existed on LS1021. > And it is mandatory to be bit_reversed according to the RM which is already > taken care of in the RCW. So the bit reversed case should be the only case > supported otherwise a lot of other places for SCFG access should be failed. > > I think we should remove the bit_reverse thing all together from the driver > for good. This will prevent future confusion. Rasmus, what do you think? Yes, all the ls1021a-derived boards I know of do have something like # Initialize bit reverse of SCFG registers 09570200 in their pre-boot-loader config file. And yes, the RM does say This register must be written 0x_ as a part of initialization sequence before writing to any other SCFG register. but nowhere does it say "or else...", nor a little honest addendum "because we accidentally released broken silicon with this misfeature _and_ wrong POR value". Can we have an official statement from NXP stating that SCFGREVCR is a hardware design bug? And can you send it through a time-machine so I had it three years ago avoiding the whole "fsl,bit-reverse device-tree-property, no, read the register if you're on a ls1021a and decide" hullabaloo. Rasmus
Re: [PATCH 0/4] deterministic random testing
On 26/10/2020 11.59, Andy Shevchenko wrote: > On Sun, Oct 25, 2020 at 10:48:38PM +0100, Rasmus Villemoes wrote: >> This is a bit of a mixed bag. >> >> The background is that I have some sort() and list_sort() rework >> planned, but as part of that series I want to extend their their test >> suites somewhat to make sure I don't goof up - and I want to use lots >> of random list lengths with random contents to increase the chance of >> somebody eventually hitting "hey, sort() is broken when the length is >> 3 less than a power of 2 and only the last two elements are out of >> order". But when such a case is hit, it's vitally important that the >> developer can reproduce the exact same test case, which means using a >> deterministic sequence of random numbers. >> >> Since Petr noticed [1] the non-determinism in test_printf in >> connection with Arpitha's work on rewriting it to kunit, this prompted >> me to use test_printf as a first place to apply that principle, and >> get the infrastructure in place that will avoid repeating the "module >> parameter/seed the rnd_state/report the seed used" boilerplate in each >> module. >> >> Shuah, assuming the kselftest_module.h changes are ok, I think it's >> most natural if you carry these patches, though I'd be happy with any >> other route as well. > > Completely in favour of this. > > Reviewed-by: Andy Shevchenko Thanks. > One note though. AFAIU the global variables are always being used in the > modules that include the corresponding header. Otherwise we might have an > extra > warning(s). I believe you have compiled with W=1 to exclude other cases. Yes, I unconditionally define the two new variables. gcc doesn't warn about them being unused, since they are referenced from inside a if (0) {} block. And when those references are the only ones, gcc is smart enough to elide the static variables completely, so they don't even take up space in .data (or .init.data) - you can verify by running nm on test_printf.o and test_bitmap.o - the former has 'seed' and 'rnd_state' symbols, the latter does not. I did it that way to reduce the need for explicit preprocessor conditionals inside C functions. Rasmus
Re: [PATCH] seqlock: avoid -Wshadow warnings
On 28/10/2020 00.34, Arnd Bergmann wrote: > On Mon, Oct 26, 2020 at 5:58 PM Peter Zijlstra wrote: >> >> On Mon, Oct 26, 2020 at 05:50:38PM +0100, Arnd Bergmann wrote: >> >>> - unsigned seq; \ >>> + unsigned __seq; \ >> >>> - unsigned seq = __read_seqcount_begin(s);\ >>> + unsigned _seq = __read_seqcount_begin(s); \ >> >>> - unsigned seq = __seqcount_sequence(s); \ >>> + unsigned __seq = __seqcount_sequence(s);\ >> >> Can we have a consistent number of leading '_' ? > > Not really ;-) > > The warning comes from raw_read_seqcount_begin() calling > __read_seqcount_begin() and both using the same variable > name. I could rename one of them and use double-underscores > for both, but I haven't come up with a good alternative name > that wouldn't make it less consistent in the process. At least x86's put_user and get_user use _pu/_gu suffixes on their local variables, so perhaps that could be made a weak default convention? __seq_rsb __seq_rrsb __seq_rrs Hm, or perhaps not. But it's still better than triplicating each macro to do a UNIQUE_ID dance. Rasmus
Re: [PATCH] printf: fix Woverride-init warning for EDEADLK errno
On 27/10/2020 10.12, Petr Mladek wrote: > On Tue 2020-10-27 09:46:28, Arnd Bergmann wrote: >> On Tue, Oct 27, 2020 at 8:23 AM Rasmus Villemoes >> wrote: >>> On 26/10/2020 22.49, Arnd Bergmann wrote: >>>> From: Arnd Bergmann >>> >>> NAK. That would end up using the "EDEADLOCK" string for the value 35 on >>> those architectures where they are the same, despite EDEADLK being the >>> by far the most used symbol. See the comments and original commit log, >>> the placement of these is deliberate. > > Good point. > >> Ok, I see. >> >>> How about we do this instead? >>> >>> when building with W=1. As the use of multiple initializers for the >>> same entry here is quite deliberate, explicitly disable that warning >>> for errname.o. >>> >>> diff --git a/lib/Makefile b/lib/Makefile >>> index ce45af50983a2a5e3582..a98119519e100103818d 100644 >>> --- a/lib/Makefile >>> +++ b/lib/Makefile >>> @@ -224,6 +224,7 @@ obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o >>> >>> obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o >>> obj-$(CONFIG_SYMBOLIC_ERRNAME) += errname.o >>> +CFLAGS_errname.o += $(call cc-disable-warning, override-init) >>> >> >> This works, but it disables a potentially useful warning in case we >> get another conflict in this file, so I'd prefer to find a way to >> avoid the warning rather than force-disabling it. > > Yeah, I think that it is better to explicitely disable the less used > variant in the code than hiding the double initialization. It will > be clear what is going on. > > >> How about adding the #ifdef around the EDEADLOCK line >> instead of the EDEADLK one? Something like >> >> diff --git a/lib/errname.c b/lib/errname.c >> index 0c4d3e66170e..93043fb960cc 100644 >> --- a/lib/errname.c >> +++ b/lib/errname.c >> @@ -38,7 +38,10 @@ static const char *names_0[] = { >> E(ECOMM), >> E(ECONNABORTED), >> E(ECONNRESET), >> + E(EDEADLK), /* EDEADLOCK */ >> +#if EDEADLK != EDEADLOCK /* mips, sparc, powerpc */ >> E(EDEADLOCK), >> +#endif >> E(EDESTADDRREQ), >> E(EDOM), >> E(EDOTDOT), >> @@ -169,7 +172,6 @@ static const char *names_0[] = { >> E(ECANCELED), /* ECANCELLED */ >> E(EAGAIN), /* EWOULDBLOCK */ >> E(ECONNREFUSED), /* EREFUSED */ >> - E(EDEADLK), /* EDEADLOCK */ > > This should stay :-) > No, Arnd moved it next to EDEADLOCK, which is fine (it can lose the comment /* EDEADLOCK */, though; the comment on the ifdef is sufficient). Especially when: > And we should remove the ECANCELLED definition. It is always the same > as ECANCELED and replaced. We do not define EWOULDBLOCK and > EREFUSED either. Yes, I'm not sure why I elided EWOULDBLOCK and EREFUSED but not ECANCELLED. So let's move EAGAIN, ECONNREFUSED and ECANCELED to their proper alphabetic place. But I also want to add a check that the things we've elided match some value that we do handle. So add something like #ifdef EREFUSED /* parisc */ static_assert(EREFUSED == ECONNREFUSED); #endif #ifdef ECANCELLED /* parisc */ static_assert(ECANCELLED == ECANCELED); #endif static_assert(EAGAIN == EWOULDBLOCK); /* everywhere */ so that if we ever import some arch that defines EREFUSED to something other than ECONNREFUSED, it would be caught. Essentially, errname.c should mention every #define E* that appears in any errno*.h. Rasmus
Re: [v2 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 27/10/2020 05.46, Biwen Li wrote: > From: Hou Zhiqiang > > Add an new IRQ chip declaration for LS1043A and LS1088A > - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A. SCFG_INTPCR[31:0] > of these SoCs is stored/read as SCFG_INTPCR[0:31] defaultly(bit > reverse) s/defaultly/by default/ I suppose. But what does that mean? Is it still configurable, just now through some undocumented register? If that register still exists, does it now have a reset value of all-ones as opposed to the ls1021 case? If it's not configurable, then describing the situation as "by default" is confusing and wrong, it should just say "On LS1043A, LS1046A, SCFG_INTPCR is stored/read bit-reversed." > - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA > > Signed-off-by: Hou Zhiqiang > Signed-off-by: Biwen Li > --- > Change in v2: > - add despcription of bit reverse > - update copyright > > drivers/irqchip/irq-ls-extirq.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/irqchip/irq-ls-extirq.c b/drivers/irqchip/irq-ls-extirq.c > index 4d1179fed77c..9587bc2607fc 100644 > --- a/drivers/irqchip/irq-ls-extirq.c > +++ b/drivers/irqchip/irq-ls-extirq.c > @@ -1,5 +1,8 @@ > // SPDX-License-Identifier: GPL-2.0 > - > +/* > + * Author: Rasmus Villemoes If I wanted my name splattered all over the files I touch or add, I'd add it myself, TYVM. The git history is plenty fine for recording authorship as far as I'm concerned, and I absolutely abhor having to skip over any kind of legalese boilerplate when opening a file. Rasmus
Re: [PATCH] printf: fix Woverride-init warning for EDEADLK errno
On 27/10/2020 08.23, Rasmus Villemoes wrote: > How about we do this instead? > > From: Rasmus Villemoes Missed: Subject: lib/errname.o: disable -Woverride-init > > The table of errno value->name contains a few duplicate entries since > e.g. EDEADLK == EDEADLOCK on most architectures. For the known cases, > the most used symbolic constant is listed last so that takes > precedence - the idea being that if someone sees "can't do that: > -EDEADLK" in dmesg, grepping for EDEADLK is more likely to find the > place where that error was generated (grepping for "can't do that" > will find the printk() that emitted it, but the source would often be > a few calls down). > > However, that means one gets > > warning: initialized field overwritten [-Woverride-init] > > when building with W=1. As the use of multiple initializers for the > same entry here is quite deliberate, explicitly disable that warning > for errname.o. > > Reported-by: Arnd Bergmann > Fixes: 57f5677e535b ("printf: add support for printing symbolic error > names") > Signed-off-by: Rasmus Villemoes > --- > lib/Makefile | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/Makefile b/lib/Makefile > index ce45af50983a2a5e3582..a98119519e100103818d 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -224,6 +224,7 @@ obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > obj-$(CONFIG_SYMBOLIC_ERRNAME) += errname.o > +CFLAGS_errname.o += $(call cc-disable-warning, override-init) > > obj-$(CONFIG_NLATTR) += nlattr.o > -- Rasmus Villemoes Software Developer Prevas A/S Hedeager 3 DK-8200 Aarhus N +45 51210274 rasmus.villem...@prevas.dk www.prevas.dk
Re: [PATCH] printf: fix Woverride-init warning for EDEADLK errno
On 26/10/2020 22.49, Arnd Bergmann wrote: > From: Arnd Bergmann > > On most architectures, gcc -Wextra warns about the list of error > numbers containing both EDEADLK and EDEADLOCK: > > lib/errname.c:15:67: warning: initialized field overwritten [-Woverride-init] >15 | #define E(err) [err + BUILD_BUG_ON_ZERO(err <= 0 || err > 300)] = "-" > #err > | ^~~ > lib/errname.c:172:2: note: in expansion of macro 'E' > 172 | E(EDEADLK), /* EDEADLOCK */ > | ^ > lib/errname.c:15:67: note: (near initialization for 'names_0[35]') >15 | #define E(err) [err + BUILD_BUG_ON_ZERO(err <= 0 || err > 300)] = "-" > #err > | ^~~ > lib/errname.c:172:2: note: in expansion of macro 'E' > 172 | E(EDEADLK), /* EDEADLOCK */ > | ^ > > Make that line conditional on the two values being distinct. > NAK. That would end up using the "EDEADLOCK" string for the value 35 on those architectures where they are the same, despite EDEADLK being the by far the most used symbol. See the comments and original commit log, the placement of these is deliberate. How about we do this instead? From: Rasmus Villemoes The table of errno value->name contains a few duplicate entries since e.g. EDEADLK == EDEADLOCK on most architectures. For the known cases, the most used symbolic constant is listed last so that takes precedence - the idea being that if someone sees "can't do that: -EDEADLK" in dmesg, grepping for EDEADLK is more likely to find the place where that error was generated (grepping for "can't do that" will find the printk() that emitted it, but the source would often be a few calls down). However, that means one gets warning: initialized field overwritten [-Woverride-init] when building with W=1. As the use of multiple initializers for the same entry here is quite deliberate, explicitly disable that warning for errname.o. Reported-by: Arnd Bergmann Fixes: 57f5677e535b ("printf: add support for printing symbolic error names") Signed-off-by: Rasmus Villemoes --- lib/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/Makefile b/lib/Makefile index ce45af50983a2a5e3582..a98119519e100103818d 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -224,6 +224,7 @@ obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o obj-$(CONFIG_SYMBOLIC_ERRNAME) += errname.o +CFLAGS_errname.o += $(call cc-disable-warning, override-init) obj-$(CONFIG_NLATTR) += nlattr.o -- 2.23.0
Re: [RESEND 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt
On 26/10/2020 09.44, Marc Zyngier wrote: > On 2020-10-26 08:01, Biwen Li wrote: >> From: Hou Zhiqiang >> >> Add an new IRQ chip declaration for LS1043A and LS1088A >> - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A >> - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA > > Three things: > - This commit message doesn't describe the bit_reverse change Yeah, please elaborate on that, as the RM for 1043 or 1046 doesn't mention anything about bit reversal for the scfg registers - they don't seem to have the utter nonsense that is SCFG_SCFGREVCR, but perhaps, instead of removing it, that has just become a hard-coded part of the IP. Also, IANAL etc., but >> +// Copyright 2019-2020 NXP really? Seems to be a bit of a stretch. At the very least, cc'ing the original author and only person to ever touch that file would have been appreciated. Rasmus
[PATCH 2/4] kselftest_module.h: unconditionally expand the KSTM_MODULE_GLOBALS() macro
Two out of three users of the kselftest_module.h header manually define the failed_tests/total_tests variables instead of making use of the KSTM_MODULE_GLOBALS() macro. However, instead of just replacing those definitions with an invocation of that macro, just unconditionally define them in the header file itself. A coming change will add a few more global variables, and at least one of those will be referenced from kstm_report() - however, that's not possible currently, since when the definition is postponed until the test module invokes KSTM_MODULE_GLOBALS(), the variable is not defined by the time the compiler parses kstm_report(). Signed-off-by: Rasmus Villemoes --- Documentation/dev-tools/kselftest.rst | 2 -- lib/test_bitmap.c | 3 --- lib/test_printf.c | 2 -- lib/test_strscpy.c | 2 -- tools/testing/selftests/kselftest_module.h | 5 ++--- 5 files changed, 2 insertions(+), 12 deletions(-) diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index a901def730d95ca4c2c1..9899e86ed470ae527fdc 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -281,8 +281,6 @@ A bare bones test module might look like this: #include "../tools/testing/selftests/kselftest/module.h" - KSTM_MODULE_GLOBALS(); - /* * Kernel module for testing the foobinator */ diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c index 4425a1dd4ef1c7d85973..02fc667a9b3d5d7de7eb 100644 --- a/lib/test_bitmap.c +++ b/lib/test_bitmap.c @@ -16,9 +16,6 @@ #include "../tools/testing/selftests/kselftest_module.h" -static unsigned total_tests __initdata; -static unsigned failed_tests __initdata; - static char pbl_buffer[PAGE_SIZE] __initdata; static const unsigned long exp1[] __initconst = { diff --git a/lib/test_printf.c b/lib/test_printf.c index 7ac87f18a10ff8209ad5..1ed4a27390cb621715ab 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -30,8 +30,6 @@ #define PAD_SIZE 16 #define FILL_CHAR '$' -static unsigned total_tests __initdata; -static unsigned failed_tests __initdata; static char *test_buffer __initdata; static char *alloced_buffer __initdata; diff --git a/lib/test_strscpy.c b/lib/test_strscpy.c index a827f94601f5d945b163..be477a52d87185ee6a01 100644 --- a/lib/test_strscpy.c +++ b/lib/test_strscpy.c @@ -10,8 +10,6 @@ * Kernel module for testing 'strscpy' family of functions. */ -KSTM_MODULE_GLOBALS(); - /* * tc() - Run a specific test case. * @src: Source string, argument to strscpy_pad() diff --git a/tools/testing/selftests/kselftest_module.h b/tools/testing/selftests/kselftest_module.h index e8eafaf0941aa716d9dc..c81c0b0c054befaf665b 100644 --- a/tools/testing/selftests/kselftest_module.h +++ b/tools/testing/selftests/kselftest_module.h @@ -9,9 +9,8 @@ * See Documentation/dev-tools/kselftest.rst for an example test module. */ -#define KSTM_MODULE_GLOBALS() \ -static unsigned int total_tests __initdata;\ -static unsigned int failed_tests __initdata +static unsigned int total_tests __initdata; +static unsigned int failed_tests __initdata; #define KSTM_CHECK_ZERO(x) do { \ total_tests++; \ -- 2.23.0
[PATCH 1/4] prandom.h: add *_state variant of prandom_u32_max
It is useful for test modules that make use of random numbers to allow the exact same series of test cases to be repeated (e.g., after fixing a bug in the code being tested). For that, the test module needs to obtain its random numbers from a private state that can be seeded by a known seed, e.g. given as a module parameter (and using a random seed when that parameter is not given). There's a few test modules I'm going to modify to follow that scheme. As preparation, add a _state variant of the existing prandom_u32_max(), and for convenience, also add a variant that produces a value in a given range. Signed-off-by: Rasmus Villemoes --- include/linux/prandom.h | 29 + 1 file changed, 29 insertions(+) diff --git a/include/linux/prandom.h b/include/linux/prandom.h index aa16e6468f91e79e1f31..58ffcd56c705be34fb98 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -46,6 +46,35 @@ static inline u32 prandom_u32_max(u32 ep_ro) return (u32)(((u64) prandom_u32() * ep_ro) >> 32); } +/** + * prandom_u32_max_state - get pseudo-random number in internal [0, hi) + * + * Like prandom_u32_max, but use the given state structure. + * @state: pointer to state structure + * @hi: (exclusive) upper bound + * + * Exception: If @hi == 0, this returns 0. + */ +static inline u32 prandom_u32_max_state(struct rnd_state *state, u32 hi) +{ + return ((u64)prandom_u32_state(state) * hi) >> 32; +} + +/** + * prandom_u32_range_state - get pseudo-random number in internal [lo, hi) + * + * @state: pointer to state structure + * @lo: (inclusive) lower bound + * @hi: (exclusive) upper bound + * + * Exception: If @lo == @hi, this returns @lo. Results are unspecified + * for @lo > @hi. + */ +static inline u32 prandom_u32_range_state(struct rnd_state *state, u32 lo, u32 hi) +{ + return lo + prandom_u32_max_state(state, hi - lo); +} + /* * Handle minimum values for seeds */ -- 2.23.0
[PATCH 3/4] kselftest_module.h: add struct rnd_state and seed parameter
Some test suites make use of random numbers to increase the test coverage when the test suite gets run on different machines and increase the chance of some corner case bug being discovered - and I'm planning on extending some existing ones in that direction as well. However, should a bug be found this way, it's important that the exact same series of tests can be repeated to verify the bug is fixed. That means the random numbers must be obtained deterministically from a generator private to the test module. To avoid adding boilerplate to various test modules, put some logic into kselftest_module.h: If the module declares that it will use random numbers, add a "seed" module parameter. If not explicitly given when the module is loaded (or via kernel command line), obtain a random one. In either case, print the seed used, and repeat that information if there was at least one test failing. Signed-off-by: Rasmus Villemoes --- tools/testing/selftests/kselftest_module.h | 35 -- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/kselftest_module.h b/tools/testing/selftests/kselftest_module.h index c81c0b0c054befaf665b..43f3ca58fcd550b8ac83 100644 --- a/tools/testing/selftests/kselftest_module.h +++ b/tools/testing/selftests/kselftest_module.h @@ -3,14 +3,31 @@ #define __KSELFTEST_MODULE_H #include +#include +#include /* * Test framework for writing test modules to be loaded by kselftest. * See Documentation/dev-tools/kselftest.rst for an example test module. */ +/* + * If the test module makes use of random numbers, define KSTM_RANDOM + * to 1 before including this header. Then a module parameter "seed" + * will be defined. If not given, a random one will be obtained. In + * either case, the used seed is reported, so the exact same series of + * tests can be repeated by loading the module with that seed + * given. + */ + +#ifndef KSTM_RANDOM +#define KSTM_RANDOM 0 +#endif + static unsigned int total_tests __initdata; static unsigned int failed_tests __initdata; +static struct rnd_state rnd_state __initdata; +static u64 seed __initdata; #define KSTM_CHECK_ZERO(x) do { \ total_tests++; \ @@ -22,11 +39,13 @@ static unsigned int failed_tests __initdata; static inline int kstm_report(unsigned int total_tests, unsigned int failed_tests) { - if (failed_tests == 0) + if (failed_tests == 0) { pr_info("all %u tests passed\n", total_tests); - else + } else { pr_warn("failed %u out of %u tests\n", failed_tests, total_tests); - + if (KSTM_RANDOM) + pr_info("random seed used was 0x%016llx\n", seed); + } return failed_tests ? -EINVAL : 0; } @@ -34,6 +53,12 @@ static inline int kstm_report(unsigned int total_tests, unsigned int failed_test static int __init __module##_init(void)\ { \ pr_info("loaded.\n"); \ + if (KSTM_RANDOM) { \ + if (!seed) \ + seed = get_random_u64();\ + prandom_seed_state(&rnd_state, seed); \ + pr_info("random seed = 0x%016llx\n", seed); \ + } \ selftest(); \ return kstm_report(total_tests, failed_tests); \ } \ @@ -44,4 +69,8 @@ static void __exit __module##_exit(void) \ module_init(__module##_init); \ module_exit(__module##_exit) +#if KSTM_RANDOM +module_param(seed, ullong, 0444); +#endif + #endif /* __KSELFTEST_MODULE_H */ -- 2.23.0
[PATCH 4/4] lib/test_printf.c: use deterministic sequence of random numbers
The printf test suite does each test with a few different buffer sizes to ensure vsnprintf() behaves correctly with respect to truncation and size reporting. It calls vsnprintf() with a buffer size that is guaranteed to be big enough, a buffer size of 0 to ensure that nothing gets written to the buffer, but it also calls vsnprintf() with a buffer size chosen to guarantee the output gets truncated somewhere in the middle. That buffer size is chosen randomly to increase the chance of finding some corner case bug (for example, there used to be some %p extension that would fail to produce any output if there wasn't room enough for it all, despite the requirement of producing as much as there's room for). I'm not aware of that having found anything yet, but should it happen, it's annoying not to be able to repeat the test with the same sequence of truncated lengths. For demonstration purposes, if we break one of the test cases deliberately, we still get different buffer sizes if we don't pass the seed parameter: root@(none):/# modprobe test_printf [ 15.317783] test_printf: vsnprintf(buf, 18, "%piS|%pIS", ...) wrote '127.000.000.001|1', expected '127-000.000.001|1' [ 15.323182] test_printf: failed 3 out of 388 tests [ 15.324034] test_printf: random seed used was 0x278bb9311979cc91 modprobe: ERROR: could not insert 'test_printf': Invalid argument root@(none):/# modprobe test_printf [ 13.940909] test_printf: vsnprintf(buf, 22, "%piS|%pIS", ...) wrote '127.000.000.001|127.0', expected '127-000.000.001|127.0' [ 13.944744] test_printf: failed 3 out of 388 tests [ 13.945607] test_printf: random seed used was 0x9f72eee1c9dc02e5 modprobe: ERROR: could not insert 'test_printf': Invalid argument but to repeat a specific sequence of tests, we can do root@(none):/# modprobe test_printf seed=0x9f72eee1c9dc02e5 [ 448.328685] test_printf: vsnprintf(buf, 22, "%piS|%pIS", ...) wrote '127.000.000.001|127.0', expected '127-000.000.001|127.0' [ 448.331650] test_printf: failed 3 out of 388 tests [ 448.332295] test_printf: random seed used was 0x9f72eee1c9dc02e5 modprobe: ERROR: could not insert 'test_printf': Invalid argument Signed-off-by: Rasmus Villemoes --- lib/test_printf.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/lib/test_printf.c b/lib/test_printf.c index 1ed4a27390cb621715ab..bbea8b807d1eafe67e01 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -24,6 +24,7 @@ #include +#define KSTM_RANDOM 1 #include "../tools/testing/selftests/kselftest_module.h" #define BUF_SIZE 256 @@ -111,8 +112,14 @@ __test(const char *expect, int elen, const char *fmt, ...) * be able to print it as expected. */ failed_tests += do_test(BUF_SIZE, expect, elen, fmt, ap); - rand = 1 + prandom_u32_max(elen+1); - /* Since elen < BUF_SIZE, we have 1 <= rand <= BUF_SIZE. */ + rand = prandom_u32_range_state(&rnd_state, 1, elen + 1); + /* +* Except for elen == 0, we have 1 <= rand <= elen < BUF_SIZE, +* i.e., the output is guaranteed to be truncated somewhere in +* the middle, and we're not pretending the buffer to be +* larger than it really is. For the boring case of elen == 0, +* rand is 1, which is of course also <= BUF_SIZE. +*/ failed_tests += do_test(rand, expect, elen, fmt, ap); failed_tests += do_test(0, expect, elen, fmt, ap); -- 2.23.0
[PATCH 0/4] deterministic random testing
This is a bit of a mixed bag. The background is that I have some sort() and list_sort() rework planned, but as part of that series I want to extend their their test suites somewhat to make sure I don't goof up - and I want to use lots of random list lengths with random contents to increase the chance of somebody eventually hitting "hey, sort() is broken when the length is 3 less than a power of 2 and only the last two elements are out of order". But when such a case is hit, it's vitally important that the developer can reproduce the exact same test case, which means using a deterministic sequence of random numbers. Since Petr noticed [1] the non-determinism in test_printf in connection with Arpitha's work on rewriting it to kunit, this prompted me to use test_printf as a first place to apply that principle, and get the infrastructure in place that will avoid repeating the "module parameter/seed the rnd_state/report the seed used" boilerplate in each module. Shuah, assuming the kselftest_module.h changes are ok, I think it's most natural if you carry these patches, though I'd be happy with any other route as well. [1] https://lore.kernel.org/lkml/20200821113710.GA26290@alley/ Rasmus Villemoes (4): prandom.h: add *_state variant of prandom_u32_max kselftest_module.h: unconditionally expand the KSTM_MODULE_GLOBALS() macro kselftest_module.h: add struct rnd_state and seed parameter lib/test_printf.c: use deterministic sequence of random numbers Documentation/dev-tools/kselftest.rst | 2 -- include/linux/prandom.h| 29 lib/test_bitmap.c | 3 -- lib/test_printf.c | 13 --- lib/test_strscpy.c | 2 -- tools/testing/selftests/kselftest_module.h | 40 ++ 6 files changed, 72 insertions(+), 17 deletions(-) -- 2.23.0
[PATCH] watchdog: sbc_fitpc2_wdt: add __user annotations
After a change to the put_user() macro on x86, kernel test robot has started sending me complaints (from sparse) about passing kernel pointers to put_user(). So add the __user annotations to the various casts in fitpc2_wdt_ioctl(), and while in here, also make the write method actually match the prototype of file_operations::write. Reported-by: kernel test robot Signed-off-by: Rasmus Villemoes --- drivers/watchdog/sbc_fitpc2_wdt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/watchdog/sbc_fitpc2_wdt.c b/drivers/watchdog/sbc_fitpc2_wdt.c index 04483d6453d6a147703e..13db71e165836eb73249 100644 --- a/drivers/watchdog/sbc_fitpc2_wdt.c +++ b/drivers/watchdog/sbc_fitpc2_wdt.c @@ -78,7 +78,7 @@ static int fitpc2_wdt_open(struct inode *inode, struct file *file) return stream_open(inode, file); } -static ssize_t fitpc2_wdt_write(struct file *file, const char *data, +static ssize_t fitpc2_wdt_write(struct file *file, const char __user *data, size_t len, loff_t *ppos) { size_t i; @@ -125,16 +125,16 @@ static long fitpc2_wdt_ioctl(struct file *file, unsigned int cmd, switch (cmd) { case WDIOC_GETSUPPORT: - ret = copy_to_user((struct watchdog_info *)arg, &ident, + ret = copy_to_user((struct watchdog_info __user *)arg, &ident, sizeof(ident)) ? -EFAULT : 0; break; case WDIOC_GETSTATUS: - ret = put_user(0, (int *)arg); + ret = put_user(0, (int __user *)arg); break; case WDIOC_GETBOOTSTATUS: - ret = put_user(0, (int *)arg); + ret = put_user(0, (int __user *)arg); break; case WDIOC_KEEPALIVE: @@ -143,7 +143,7 @@ static long fitpc2_wdt_ioctl(struct file *file, unsigned int cmd, break; case WDIOC_SETTIMEOUT: - ret = get_user(time, (int *)arg); + ret = get_user(time, (int __user *)arg); if (ret) break; @@ -157,7 +157,7 @@ static long fitpc2_wdt_ioctl(struct file *file, unsigned int cmd, fallthrough; case WDIOC_GETTIMEOUT: - ret = put_user(margin, (int *)arg); + ret = put_user(margin, (int __user *)arg); break; } -- 2.23.0
[PATCH] kernel/sys.c: fix prototype of prctl_get_tid_address()
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in fact a "pointer to (pointer to int in userspace) in userspace". So sparse rightfully complains about passing a kernel pointer to put_user(). Reported-by: kernel test robot Signed-off-by: Rasmus Villemoes --- kernel/sys.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 6401880dff74a80a4594..85395f5cebc34d073cf4 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2238,12 +2238,12 @@ static int prctl_set_mm(int opt, unsigned long addr, } #ifdef CONFIG_CHECKPOINT_RESTORE -static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr) +static int prctl_get_tid_address(struct task_struct *me, int __user * __user *tid_addr) { return put_user(me->clear_child_tid, tid_addr); } #else -static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr) +static int prctl_get_tid_address(struct task_struct *me, int __user * __user *tid_addr) { return -EINVAL; } @@ -2427,7 +2427,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = prctl_set_mm(arg2, arg3, arg4, arg5); break; case PR_GET_TID_ADDRESS: - error = prctl_get_tid_address(me, (int __user **)arg2); + error = prctl_get_tid_address(me, (int __user * __user *)arg2); break; case PR_SET_CHILD_SUBREAPER: me->signal->is_child_subreaper = !!arg2; -- 2.23.0
[PATCH] drm/ttm: add __user annotation in radeon_ttm_vram_read
Keep sparse happy by preserving the __user annotation when casting. Reported-by: kernel test robot Signed-off-by: Rasmus Villemoes --- kernel test robot has already started spamming me due to 9c5743dff. If I don't fix those warnings I'll keep getting those emails for months, so let me do the easy ones. drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 36150b7f31a90aa1eece..ecfe88b0a35d8f317712 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -1005,7 +1005,7 @@ static ssize_t radeon_ttm_vram_read(struct file *f, char __user *buf, value = RREG32(RADEON_MM_DATA); spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); - r = put_user(value, (uint32_t *)buf); + r = put_user(value, (uint32_t __user *)buf); if (r) return r; -- 2.23.0
[PATCH 4.9-stable] scripts/setlocalversion: make git describe output more reliable
commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream. When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down. [Adapt to `` vs $() differences between 4.9 and upstream.] Signed-off-by: Rasmus Villemoes Signed-off-by: Masahiro Yamada --- scripts/setlocalversion | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/scripts/setlocalversion b/scripts/setlocalversion index aa28c3f29809314bfa58..0c8741b795d0c82a38c9 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -44,7 +44,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=`git rev-parse --verify --short HEAD 2>/dev/null`; then + head=$(git rev-parse --verify HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -58,11 +58,22 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="`git describe 2>/dev/null`"; then - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' - - # If we don't have a tag at all we print -g{commitish}. + # + # Ensure the abbreviated sha1 has exactly 12 + # hex characters, to make the output + # independent of git version, local + # core.abbrev settings and/or total number of + # objects in the current repository - passing + # --abbrev=12 ensures a minimum of 12, and the + # awk substr() then picks the 'g' and first 12 + # hex chars. + if atag="$(git describe --abbrev=12 2>/dev/null)"; then + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}' + + # If we don't have a tag at all we print -g{commitish}, + # again using exactly 12 hex chars. else + head="$(echo $head | cut -c1-12)" printf '%s%s' -g $head fi fi -- 2.23.0
[PATCH 4.4-stable] scripts/setlocalversion: make git describe output more reliable
commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream. When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down. [Adapt to `` vs $() differences between 4.4 and upstream.] Signed-off-by: Rasmus Villemoes Signed-off-by: Masahiro Yamada --- scripts/setlocalversion | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/scripts/setlocalversion b/scripts/setlocalversion index aa28c3f29809314bfa58..0c8741b795d0c82a38c9 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -44,7 +44,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=`git rev-parse --verify --short HEAD 2>/dev/null`; then + head=$(git rev-parse --verify HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -58,11 +58,22 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="`git describe 2>/dev/null`"; then - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' - - # If we don't have a tag at all we print -g{commitish}. + # + # Ensure the abbreviated sha1 has exactly 12 + # hex characters, to make the output + # independent of git version, local + # core.abbrev settings and/or total number of + # objects in the current repository - passing + # --abbrev=12 ensures a minimum of 12, and the + # awk substr() then picks the 'g' and first 12 + # hex chars. + if atag="$(git describe --abbrev=12 2>/dev/null)"; then + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}' + + # If we don't have a tag at all we print -g{commitish}, + # again using exactly 12 hex chars. else + head="$(echo $head | cut -c1-12)" printf '%s%s' -g $head fi fi -- 2.23.0
[PATCH] x86/uaccess: fix code generation in put_user()
Quoting https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html: You can define a local register variable and associate it with a specified register... The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm). This may be necessary if the constraints for a particular machine don't provide sufficient control to select the desired register. On 32-bit x86, this is used to ensure that gcc will put an 8-byte value into the %edx:%eax pair, while all other cases will just use the single register %eax (%rax on x86-64). While the _ASM_AX actually just expands to "%eax", note this comment next to get_user() which does something very similar: * The use of _ASM_DX as the register specifier is a bit of a * simplification, as gcc only cares about it as the starting point * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits * (%ecx being the next register in gcc's x86 register sequence), and * %rdx on 64 bits. However, getting this to work requires that there is no code between the assignment to the local register variable and its use as an input to the asm() which can possibly clobber any of the registers involved - including evaluation of the expressions making up other inputs. In the current code, the ptr expression used directly as an input may cause such code to be emitted. For example, Sean Christopherson observed that with KASAN enabled and ptr being current->set_child_tid (from chedule_tail()), the load of current->set_child_tid causes a call to __asan_load8() to be emitted immediately prior to the __put_user_4 call, and Naresh Kamboju reports that various mmstress tests fail on KASAN-enabled builds. It's also possible to synthesize a broken case without KASAN if one uses "foo()" as the ptr argument, with foo being some "extern u64 __user *foo(void);" (though I don't know if that appears in real code). Fix it by making sure ptr gets evaluated before the assignment to __val_pu, and add a comment that __val_pu must be the last thing computed before the asm() is entered. Cc: Sean Christopherson Reported-by: Naresh Kamboju Tested-by: Naresh Kamboju Fixes: d55564cfc222 ("x86: Make __put_user() generate an out-of-line call") Signed-off-by: Rasmus Villemoes --- I tried to put together a changelog to earn just a little claim-to-fame, feel free to edit, especially if I got something wrong. Did I miss any appropriate *-by tags? I'm wondering if one would also need to make __ptr_pu and __ret_pu explicitly "%"_ASM_CX". arch/x86/include/asm/uaccess.h | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index f13659523108..c9fa7be3df82 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -208,16 +208,24 @@ extern void __put_user_nocheck_2(void); extern void __put_user_nocheck_4(void); extern void __put_user_nocheck_8(void); +/* + * ptr must be evaluated and assigned to the temporary __ptr_pu before + * the assignment of x to __val_pu, to avoid any function calls + * involved in the ptr expression (possibly implicitly generated due + * to KASAN) from clobbering %ax. + */ #define do_put_user_call(fn,x,ptr) \ ({ \ int __ret_pu; \ + void __user *__ptr_pu; \ register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX); \ __chk_user_ptr(ptr);\ + __ptr_pu = (ptr); \ __val_pu = (x); \ asm volatile("call __" #fn "_%P[size]" \ : "=c" (__ret_pu), \ ASM_CALL_CONSTRAINT \ -: "0" (ptr), \ +: "0" (__ptr_pu), \ "r" (__val_pu), \ [size] "i" (sizeof(*(ptr))) \ :"ebx"); \ -- 2.23.0
Re: [PATCH v2] lib: Convert test_printf.c to KUnit
On 22/10/2020 21.16, Andy Shevchenko wrote: > On Thu, Oct 22, 2020 at 08:43:49PM +0530, Arpitha Raghunandan wrote: >> Convert test lib/test_printf.c to KUnit. More information about >> KUnit can be found at: >> https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html. >> KUnit provides a common framework for unit tests in the kernel. >> KUnit and kselftest are standardizing around KTAP, converting this >> test to KUnit makes this test output in KTAP which we are trying to >> make the standard test result format for the kernel. More about >> the KTAP format can be found at: >> https://lore.kernel.org/linux-kselftest/cy4pr13mb1175b804e31e502221bc8163fd...@cy4pr13mb1175.namprd13.prod.outlook.com/. >> I ran both the original and converted tests as is to produce the >> output for success of the test in the two cases. I also ran these >> tests with a small modification to show the difference in the output >> for failure of the test in both cases. The modification I made is: >> - test("127.000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, &sa.sin_addr); >> + test("127-000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, &sa.sin_addr); >> >> Original test success: >> [0.591262] test_printf: loaded. >> [0.591409] test_printf: all 388 tests passed >> >> Original test failure: >> [0.619345] test_printf: loaded. >> [0.619394] test_printf: vsnprintf(buf, 256, "%piS|%pIS", ...) >> wrote '127.000.000.001|127.0.0.1', expected >> '127-000.000.001|127.0.0.1' >> [0.619395] test_printf: vsnprintf(buf, 25, "%piS|%pIS", ...) wrote >> '127.000.000.001|127.0.0.', expected '127-000.000.001|127.0.0.' >> [0.619396] test_printf: kvasprintf(..., "%piS|%pIS", ...) returned >> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' >> [0.619495] test_printf: failed 3 out of 388 tests >> >> Converted test success: >> # Subtest: printf-kunit-test >> 1..1 >> ok 1 - selftest >> ok 1 - printf-kunit-test >> >> Converted test failure: >> # Subtest: printf-kunit-test >> 1..1 >> # selftest: EXPECTATION FAILED at lib/printf_kunit.c:82 >> vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote >> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' >> # selftest: EXPECTATION FAILED at lib/printf_kunit.c:82 >> vsnprintf(buf, 5, "%pi4|%pI4", ...) wrote '127.', expected '127-' >> # selftest: EXPECTATION FAILED at lib/printf_kunit.c:118 >> kvasprintf(..., "%pi4|%pI4", ...) returned >> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' >> not ok 1 - selftest >> not ok 1 - printf-kunit-test > > Not bad. Rasmus, what do you think? Much better, but that '1..1' and reporting the entire test suite as 1 single (failing or passing) test is (also) a regression. Look at the original >> [0.591409] test_printf: all 388 tests passed or >> [0.619495] test_printf: failed 3 out of 388 tests That's far more informative, and I'd prefer if the summary information (whether in the all-good case or some-failing) included something like this. In particular, I have at some point spotted that I failed to properly hook up a new test case (or perhaps failed to re-compile, or somehow still ran the old kernel binary, don't remember which it was) by noticing that the total number of tests hadn't increased. The new output would not help catch such PEBKACs. Rasmus
Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]
On 23/10/2020 07.02, Sean Christopherson wrote: > On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote: >> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz wrote: >>> >>> The kernel Naresh originally referred to is here: >>> https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/ >> >> Thanks. >> >> And when I started looking at it, I realized that my original idea >> ("just look for __put_user_nocheck_X calls, there aren't so many of >> those") was garbage, and that I was just being stupid. >> >> Yes, the commit that broke was about __put_user(), but in order to not >> duplicate all the code, it re-used the regular put_user() >> infrastructure, and so all the normal put_user() calls are potential >> problem spots too if this is about the compiler interaction with KASAN >> and the asm changes. >> >> So it's not just a couple of special cases to look at, it's all the >> normal cases too. >> >> Ok, back to the drawing board, but I think reverting it is probably >> the right thing to do if I can't think of something smart. >> >> That said, since you see this on x86-64, where the whole ugly trick with that >> >>register asm("%"_ASM_AX) >> >> is unnecessary (because the 8-byte case is still just a single >> register, no %eax:%edx games needed), it would be interesting to hear >> if the attached patch fixes it. That would confirm that the problem >> really is due to some register allocation issue interaction (or, >> alternatively, it would tell me that there's something else going on). > > I haven't reproduced the crash, but I did find a smoking gun that confirms the > "register shenanigans are evil shenanigans" theory. I ran into a similar > thing > recently where a seemingly innocuous line of code after loading a value into a > register variable wreaked havoc because it clobbered the input register. > > This put_user() in schedule_tail(): > >if (current->set_child_tid) >put_user(task_pid_vnr(current), current->set_child_tid); > > generates the following assembly with KASAN out-of-line: > >0x810dccc9 <+73>: xor%edx,%edx >0x810dcccb <+75>: xor%esi,%esi >0x810dcccd <+77>: mov%rbp,%rdi >0x810dccd0 <+80>: callq 0x810bf5e0 <__task_pid_nr_ns> >0x810dccd5 <+85>: mov%r12,%rdi >0x810dccd8 <+88>: callq 0x81388c60 <__asan_load8> >0x810dccdd <+93>: mov0x590(%rbp),%rcx >0x810dcce4 <+100>: callq 0x817708a0 <__put_user_4> >0x810dcce9 <+105>: pop%rbx >0x810dccea <+106>: pop%rbp >0x810dcceb <+107>: pop%r12 > > __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by > __asan_load8()'s check on current for the current->set_child_tid dereference. > Yup, and you don't need KASAN to implicitly generate function calls for you. With x86_64 defconfig, I get extern u64 __user *get_destination(int x, int y); void pu_test(void) { u64 big = 0x1234abcd5678; if (put_user(big, get_destination(4, 5))) pr_warn("uh"); } to generate 4d60 : 4d60: 53 push %rbx 4d61: be 05 00 00 00 mov$0x5,%esi 4d66: bf 04 00 00 00 mov$0x4,%edi 4d6b: e8 00 00 00 00 callq 4d70 4d6c: R_X86_64_PC32 get_destination-0x4 4d70: 48 89 c1mov%rax,%rcx 4d73: e8 00 00 00 00 callq 4d78 4d74: R_X86_64_PC32 __put_user_8-0x4 4d78: 85 c9 test %ecx,%ecx 4d7a: 75 02 jne4d7e 4d7c: 5b pop%rbx 4d7d: c3 retq 4d7e: 5b pop%rbx 4d7f: 48 c7 c7 00 00 00 00mov$0x0,%rdi 4d82: R_X86_64_32S .rodata.str1.1+0xfa 4d86: e9 00 00 00 00 jmpq 4d8b 4d87: R_X86_64_PC32 printk-0x4 That's certainly garbage. Now, I don't know if it's a sufficient fix (or could break something else), but the obvious first step of rearranging so that the ptr argument is evaluated before the assignment to __val_pu diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 477c503f2753..b5d3290fcd09 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -235,13 +235,13 @@ extern void __put_user_nocheck_8(void); #define do_put_user_call(fn,x,ptr) \ ({ \ int __ret_pu; \ - register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX); \ + __typeof__(ptr) __ptr = (ptr); \ + register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX) =
[PATCH] Kbuild.include: remove leftover comment for filechk utility
After commit 43fee2b23895 ("kbuild: do not redirect the first prerequisite for filechk"), the rule is no longer automatically passed $< as stdin, so remove the stale comment. Fixes: 43fee2b23895 ("kbuild: do not redirect the first prerequisite for filechk") Signed-off-by: Rasmus Villemoes --- scripts/Kbuild.include | 2 -- 1 file changed, 2 deletions(-) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 83a1637417e5..08e011175b4c 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -56,8 +56,6 @@ kecho := $($(quiet)kecho) # - If no file exist it is created # - If the content differ the new file is used # - If they are equal no change, and no timestamp update -# - stdin is piped in from the first prerequisite ($<) so one has -# to specify a valid file as first prerequisite (often the kbuild file) define filechk $(Q)set -e; \ mkdir -p $(dir $@); \ -- 2.23.0
Re: [GIT PULL] printk for 5.10 (includes lockless ringbuffer)
On 14/10/2020 16.16, Geert Uytterhoeven wrote: > Hi Petr, > > On Mon, Oct 12, 2020 at 4:50 PM Petr Mladek wrote: >> - Fully lockless ringbuffer implementation, including the support for >> continuous lines. It will allow to store and read messages in any >> situation wihtout the risk of deadlocks and without the need >> of temporary per-CPU buffers. > > linux-m68k-atari_defconfig$ bloat-o-meter vmlinux.old > vmlinux.lockless_ringbuffer > add/remove: 39/16 grow/shrink: 9/15 up/down: 214075/-4362 (209713) > Function old new delta > _printk_rb_static_infos- 180224 +180224 > _printk_rb_static_descs- 24576 +24576 > [...] > > Seriously?!? Or am I being misled by the tools? > > linux-m68k-atari_defconfig$ size vmlinux.old vmlinux.lockless_ringbuffer >textdata bss dec hex filename > 3559108 941716 12 4678596 4763c4 vmlinux.old > 3563922 1152496 175276 4891694 4aa42e vmlinux.lockless_ringbuffer > > Apparently not... Hm, that's quite a lot. And the only reason the buffers don't live entirely in .bss is because a few of their entries have non-zero initializers. Perhaps one could add a .init.text.initialize_static_data section of function pointers, with the _DEFINE_PRINTKRB macro growing something like static void __init __initialize_printkrb_##name(void) { \ _##name##_descs[_DESCS_COUNT(descbits) - 1] = ...; \ _##name##_infos[0] = ...; \ _##name##_infos[_DESCS_COUNT(descbits) - 1] = ...; \ } \ static_data_initializer(__initialize_printkrb_##name); with static_data_initalizer being the obvious yoga for putting a function pointer in the .init.text.initialize_static_data section. Then very early in start_kernel(), probably first thing, iterate that section and call all the functions. But maybe that's not even early enough? Rasmus
Re: using one gpio for multiple dht22 sensors
On 14/10/2020 11.12, Peter Rosin wrote: > Hi Rasnus, > > On 2020-10-13 23:34, Rasmus Villemoes wrote: >> Hi Peter >> >> I'm going to hook up a bunch of dht22 humidity (and temperature) sensors >> [1] (drivers/iio/humidity/dht11.c), but partly due to limited number of >> available gpios, I'm also going to use a 74hc4051 multiplexer [2], so >> that all the dht22's actually sit on the same gpio. >> >> It's pretty obvious how the multiplexer is to be described in >> device-tree (I'm probably going to send a patch to add support for an >> optional "enable-gpio", as on the 74hc4051, so that MUX_IDLE_DISCONNECT >> gets supported). >> >> It also seems like io-channel-mux should somehow magically apply to all >> kinds of iio devices, but it can't be that simple. And if it is, I can't >> figure out how to write the DT. So: > > The io-channel-mux is for the situation where you have only one iio-device, > and where you can control its environment with a mux. I.e. it is not about > how the device communicates with the host. You then get one new "virtual" > iio-device for every (valid) state of the mux, and when you access those > iio-devices, the mux is configured to select the correct input/output for > the iio-device in question. At least, it should be possible for output > devices as well, but I guess you kind of get into trouble with the output > signal not persisting when the mux changes state, but that is a problem > for the HW people ;-). I originally used it for a single adc device where > the mux simply selected what to measure. I see, so it's not really applicable here. >> - do I need to teach the dht11.c driver to be mux-aware? >> - currently, a dht11/dht22 shows up in sysfs with just two files, >> in_humidityrelative_input and in_temp_input. Now, should I end up with >> one device and, say, 2*8 files (so that it seems like one sensor with >> eight input channels), or should it show up as eight different devices? >> If the latter, I guess the device tree would end up with the same "gpios >> = " spec for all eight - is that even allowed? > > It's not 100% clear to me how this is connected, but I'm guessing that you > have the "DATA-signal" pin of the dht22s connected to the Y pins of the 4051, > and that Z pin of the 4051 is connected to some gpio, so that you are able > to communicate with the various dht22s by controlling the mux. Exactly. And currently, I just have one dht22 in DT, listing the gpio the Z-pin is connected to, and have tested that I can talk to all of them by manipulating the mux from userspace (well, the driver thinks there's only one dht22, so it imposes a 2 second delay before trying to talk to "it" again - which is one reason I want to do this right). > This calls for a mux on the single source of communication from the host > to the various real devices that share this single source of communication. > In other words, more like an i2c-mux than an io-channel-mux. > > I.e., what you need is "the other kind" of mux/gpio driver, i.e. a driver > that "fans out" a single gpio so that other drivers can treat the pins on > the other side of the mux as gpios. Hm, so I should create a new "gpio-controller" driver that presents each of the Y pins as a separate gpio to other DT consumers? But while all the devm_gpiod_get() would then succeed, only one of them could be usable at a time. The "gpio-controller" driver would have to handle any gpiod_xxx call by setting the mux appropriately, then forward the gpiod_xxx call to the Z pin - but it can't really know when the consumer is actually done with the gpio. And the consumer (dht11.c) has no way to know that it should somehow "acquire" and "release" the gpiod when it's not in use. > I guess you'd have to sort out irq-routing through this new mux-gpio driver > since the dht22 driver uses irqs to read input, Well... I've had to monkeypatch the dht11 driver to do local_irq_save()/read the gpio tightly in a loop until 80something edges have been seen/local_irq_restore() to get stable readings (maybe I'll do that as a real patch hanging off a module parameter - the dht22 "1-wire" protocol is kinda crappy...), so that's not really a problem. > > I'm not an expert on the intricacies of the gpio subsystem... > > Does that help? Somewhat. Thanks for the quick reply, I'll have to do some more digging and experimentation. Rasmus
Re: [PATCH 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC
On 13/10/2020 23.09, Al Viro wrote: > On Tue, Oct 13, 2020 at 04:06:08PM +0200, Giuseppe Scrivano wrote: >> +spin_lock(&cur_fds->file_lock); >> +fdt = files_fdtable(cur_fds); >> +cur_max = fdt->max_fds - 1; >> +max_fd = min(max_fd, cur_max); >> +while (fd <= max_fd) >> +__set_close_on_exec(fd++, fdt); >> +spin_unlock(&cur_fds->file_lock); > > First of all, this is an atrocious way to set all bits > in a range. What's more, you don't want to set it for *all* > bits - only for the ones present in open bitmap. It's probably > harmless at the moment, but let's not create interesting surprises > for the future. Eh, why not? They can already be set for unallocated slots: commit 5297908270549b734c7c2556745e2385b6d4941d Author: Mateusz Guzik Date: Tue Oct 3 12:58:14 2017 +0200 vfs: stop clearing close on exec when closing a fd Codepaths allocating a fd always make sure the bit is set/unset depending on flags, thus clearing on close is redundant. And while we're on that subject, yours truly suggested exactly that two years prior [1], with a follow-up [2] in 2018 to do what wasn't done in 5297908, but (still) seems like obvious micro-optimizations, given that the close_on_exec bitmap is not maintained as a subset of the open bitmap. Mind taking a look at [2]? [1] https://lore.kernel.org/lkml/1446543679-28849-1-git-send-email-li...@rasmusvillemoes.dk/t/#u [2] https://lore.kernel.org/lkml/20181024160159.25884-1-li...@rasmusvillemoes.dk/ Rasmus
using one gpio for multiple dht22 sensors
Hi Peter Since you're the author of io-channel-mux.txt, gpio-mux.txt and mux-controller.txt, I hope you don't mind me asking some perhaps silly questions. I'm going to hook up a bunch of dht22 humidity (and temperature) sensors [1] (drivers/iio/humidity/dht11.c), but partly due to limited number of available gpios, I'm also going to use a 74hc4051 multiplexer [2], so that all the dht22's actually sit on the same gpio. It's pretty obvious how the multiplexer is to be described in device-tree (I'm probably going to send a patch to add support for an optional "enable-gpio", as on the 74hc4051, so that MUX_IDLE_DISCONNECT gets supported). It also seems like io-channel-mux should somehow magically apply to all kinds of iio devices, but it can't be that simple. And if it is, I can't figure out how to write the DT. So: - do I need to teach the dht11.c driver to be mux-aware? - currently, a dht11/dht22 shows up in sysfs with just two files, in_humidityrelative_input and in_temp_input. Now, should I end up with one device and, say, 2*8 files (so that it seems like one sensor with eight input channels), or should it show up as eight different devices? If the latter, I guess the device tree would end up with the same "gpios = " spec for all eight - is that even allowed? If you can show how the DT is supposed to describe this situation, I'll try to fill out the missing pieces on the driver side. [I could also just not describe the multiplexer at all and control everything about it by toggling gpios from userspace, and just have a single dht22 in DT, but I prefer doing this "the right way" if it's feasible.] Thanks, Rasmus Just FTR: [1] https://www.electroschematics.com/wp-content/uploads/2015/02/DHT22-datasheet.pdf [2] https://assets.nexperia.com/documents/data-sheet/74HC_HCT4051.pdf
Re: [PATCH 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC
On 13/10/2020 22.54, Christian Brauner wrote: > On Tue, Oct 13, 2020 at 04:06:08PM +0200, Giuseppe Scrivano wrote: > > Hey Guiseppe, > > Thanks for the patch! > >> When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't >> immediately close the files but it sets the close-on-exec bit. > > Hm, please expand on the use-cases a little here so people know where > and how this is useful. Keeping the rationale for a change in the commit > log is really important. > > I think I don't have quarrels with this patch in principle but I wonder > if something like the following wouldn't be easier to follow: > > diff --git a/fs/file.c b/fs/file.c > index 21c0893f2f1d..872a4098c3be 100644 > --- a/fs/file.c > +++ b/fs/file.c > @@ -672,6 +672,32 @@ int __close_fd(struct files_struct *files, unsigned fd) > } > EXPORT_SYMBOL(__close_fd); /* for ksys_close() */ > > +static inline void __range_cloexec(struct files_struct *cur_fds, > +unsigned int fd, unsigned max_fd) > +{ > + struct fdtable *fdt; > + spin_lock(&cur_fds->file_lock); > + fdt = files_fdtable(cur_fds); > + while (fd <= max_fd) > + __set_close_on_exec(fd++, fdt); Doesn't that want to be bitmap_set(fdt->close_on_exec, fd, max_fd - fd + 1) to do word-at-a-time? I assume this would mostly be called with (3, ~0U) as arguments or something like that. Rasmus
Re: [GIT PULL] io_uring updates for 5.10-rc1
On 13/10/2020 21.49, Jens Axboe wrote: > On 10/13/20 1:46 PM, Linus Torvalds wrote: >> On Mon, Oct 12, 2020 at 6:46 AM Jens Axboe wrote: >>> >>> Here are the io_uring updates for 5.10. >> >> Very strange. My clang build gives a warning I've never seen before: >> >>/tmp/io_uring-dd40c4.s:26476: Warning: ignoring changed section >> attributes for .data..read_mostly >> >> and looking at what clang generates for the *.s file, it seems to be >> the "section" line in: >> >> .type io_op_defs,@object # @io_op_defs >> .section.data..read_mostly,"a",@progbits >> .p2align4 >> >> I think it's the combination of "const" and "__read_mostly". >> >> I think the warning is sensible: how can a piece of data be both >> "const" and "__read_mostly"? If it's "const", then it's not "mostly" >> read - it had better be _always_ read. >> >> I'm letting it go, and I've pulled this (gcc doesn't complain), but >> please have a look. > > Huh weird, I'll take a look. FWIW, the construct isn't unique across > the kernel. Citation needed. There's lots of "pointer to const foo" stuff declared as __read_mostly, but I can't find any objects that are themselves both const and __read_mostly. Other than that io_op_defs and io_uring_fops now. But... there's something a little weird: $ grep read_most -- fs/io_uring.s .section.data..read_mostly,"a",@progbits $ readelf --wide -S fs/io_uring.o | grep read_most [32] .data..read_mostly PROGBITS 01b4e0 000188 00 WA 0 0 32 (this is with gcc/gas). So despite that .section directive not saying "aw", the section got the W flag anyway. There are lots of .section "__tracepoints_ptrs", "a" .pushsection .smp_locks,"a" in the .s file, and those sections do end up with just the A bit in the .o file. Does gas maybe somehow special-case a section name starting with .data? Rasmus
Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall
On 10/07/2020 17.57, Matthew Wilcox wrote: > On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote: >> The ability to check open file descriptions for equality (without >> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be >> useful outside of the checkpoint/restore use case - for example, >> systemd uses kcmp() to deduplicate the per-service file descriptor >> store. >> >> Make it possible to have the kcmp() syscall without the full >> CONFIG_CHECKPOINT_RESTORE. > > If systemd is using it, is it even worth making it conditional any more? > Maybe for CONFIG_EXPERT builds, it could be de-selectable. > [hm, I dropped the ball, sorry for the necromancy] Well, first, I don't want to change any defaults here, if that is to be done, it should be a separate patch. Second, yes, systemd uses it for the de-duplication, and for that reason recommends CONFIG_CHECKPOINT_RESTORE (at least, according to their README) - but I'm not aware of any daemons that actually make use of systemd's file descriptor store, so it's not really something essential to every systemd-based system out there. It would be nice if systemd could change its recommendation to just CONFIG_KCMP_SYSCALL. But it's also useful for others, e.g. I have some code that wants to temporarily replace stdin/stdout/stderr with some other file descriptors, but needs to preserve the '0 is a dup of 1, or not' state (i.e., is the same struct file) - that cannot reliably be determined from fstat()/lseek(SEEK_CUR)/F_GETFL or whatever else one could throw at an fd. Rasmus
Re: [PATCH] lib: Convert test_printf.c to KUnit
On 12/10/2020 22.46, Brendan Higgins wrote: > On Fri, Aug 21, 2020 at 03:28:49PM +0300, Andy Shevchenko wrote: >> On Fri, Aug 21, 2020 at 01:37:10PM +0200, Petr Mladek wrote: >>> On Mon 2020-08-17 09:06:32, Rasmus Villemoes wrote: >>>> On 17/08/2020 06.30, Arpitha Raghunandan wrote: >>>>> Converts test lib/test_printf.c to KUnit. >>>>> More information about KUnit can be found at >>>>> https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html. >>>>> KUnit provides a common framework for unit tests in the kernel. >>>> >>>> So I can continue to build a kernel with some appropriate CONFIG set to >>>> y, boot it under virt-me, run dmesg and see if I broke printf? That's >>>> what I do now, and I don't want to have to start using some enterprisy >>>> framework. >>> >>> I had the same concern. I have tried it. > > Sorry you feel that way. Do you have any suggestions on how we can make > it seem less enterprisy? Seems like there are people here who are not a > fan of the output format, so of which we can fix here, some of which is > part of KTAP[1]. I'm fine with machine-readable TAP, but I most defintely also want human-readable, which means all the excessive and pointless lines need to go away. >> Which raises an obvious question: did the people who convert this test this >> themselves? Looks like a janitor work in the area without understanding the >> area good enough. > > Looks to me like Arpitha ran it, but you are right, we don't have a lot > of familiarity with this area; we were treating it as "janitor work" as > you say. > > Our intention was just to take some existing tests and as non-invasively > as possible, get them to report using a common format, and maybe even > get some of the tests to follow a common pattern. > >> Probably I will NAK all those patches from now on, until it will be good >> commit >> messages and cover of risen aspects, including reference to before and after >> outcome for passed and failed test cases. > > Fair enough, hopefully we can address these issues in the next revision. > > One issue though, with the "before and after outcome" you are > referencing; are you referring to the issue that Petr pointed out in how > they are inconsistent: > > + original code: vsnprintf(buf, 6, "%pi4|%pI4", ...) wrote '127.0', > expected '127-0' > + kunit code: vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote > '127.000.000.001|127', expected '127-000.000.001|127' > > (I think Rasmus addressed this.) Or are your referring to something > else? Yeah, that change is fine and expected, can we stop bringing that up. It's all the explicit "memcmp() == 0 failed" gunk at least I am concerned with. If you can get rid of that (basically, stop stringifying the code, that's completely irrelevant) and just get the messages from the test itself that explains what went wrong. I'm fine with interspersing that with a few TAP-readable lines. But things like Expected memcmp(test_buffer, expect, written) == 0, but memcmp(test_buffer, expect, written) == 1 0 == 0 are utterly useless. We're not _testing_ memcmp, we're _using_ it to know if vsprintf() did as we expected. So just mechanically changing "memcmp() == 0" into "SOME_MACRO(memcmp(), 0)" is never going to work, at least when SOME_MACRO does the stringify and ends up producing the above. But if you can end up producing [ 56.795433] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:76 vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote '127.000.000.001|127', expected '127-000.000.001|127' that's fine; that's basically just prepending an EXPECTATION FAILED line to the existing output. So doing it properly would probably be either - change the existing pr_warn()s to use some KUNIT macro that generates whatever extra info is needed by TAP, in addition to the current human-readable message, and/or - just add a few lines of TAP-suitable FAIL/PASS lines here and there but let me repeat that the control flow (early returns) in do_test() cannot be modified. Rasmus
Re: [PATCH] bcache: Use #ifdef instead of boolean variable
On 09/10/2020 20.34, Alex Dewar wrote: > The variable async_registration is used to indicate whether or not > CONFIG_BCACHE_ASYNC_REGISTRATION is enabled, triggering a (false) > warning in Coverity about unreachable code. However, it seems clearer in > this case just to use an #ifdef, so let's do that instead. > > Addresses-Coverity-ID: 1497746: Control flow issues (DEADCODE) I think that coverity check needs to be ignored. The kernel is full of things that are supposed to be eliminated by the compiler, but still checked for valid syntax etc. Often it's even more hidden than this, something like // some header #ifdef CONFIG_FOO int foo(void); #else static inline int foo(void) { return 0; } #endif // code if (foo()) { ... // this goes away for CONFIG_FOO=n } > Signed-off-by: Alex Dewar > --- > drivers/md/bcache/super.c | 40 +-- > 1 file changed, 17 insertions(+), 23 deletions(-) > > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index 46a00134a36a..6d4127881c6a 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -2504,11 +2504,6 @@ static ssize_t register_bcache(struct kobject *k, > struct kobj_attribute *attr, > struct cache_sb_disk *sb_disk; > struct block_device *bdev; > ssize_t ret; > - bool async_registration = false; > - > -#ifdef CONFIG_BCACHE_ASYNC_REGISTRATION > - async_registration = true; > -#endif If anything, this should simply be changed to bool async_registration = IS_ENABLED(CONFIG_BCACHE_ASYNC_REGISTRATION); Rasmus
Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers
On 30/09/2020 15.35, Steven Rostedt wrote: > On Wed, 30 Sep 2020 10:06:24 +0200 > Rasmus Villemoes wrote: > >> True. But remember that printk is called from _everywhere_, with all >> sorts of locks held and/or preemption disabled or whatnot, and every >> cycle spent in printk makes those windows wider. Doubling the cost of >> every single printk by unconditionally doing vsnprintf() twice is a bad >> idea. > > But the console output is usually magnitudes more expensive than the > vsnprintf(), would doing it twice really make a difference? AFAIU, not every message gets printed to the console directly - syslog(2): /proc/sys/kernel/printk /proc/sys/kernel/printk is a writable file containing four integer val‐ ues that influence kernel printk() behavior when printing or logging error messages. The four values are: console_loglevel Only messages with a log level lower than this value will be printed to the console. The default value for this field is DEFAULT_CONSOLE_LOGLEVEL (7), but it is set to 4 if the kernel command line contains the word "quiet", So the normal state of things is that you don't pay the cost of printing to the console for all the pr_debug (ok, they may be compiled out or run-time disabled depending on DYNAMIC_DEBUG etc.), nor info, notice, warn. For those messages that are not directly written to the console, the vsnprintf() is a large part of the cost (not exactly half, of course, so doubling is an exaggeration, but whether it's 70% or 100% doesn't really matter). I'm not at all concerned about pr_err and above becoming more expensive, they are rare. But random drivers are filled with random pr_info in random contexts - just a small selection from dmesg -x shows these really important things: kern :info : [ 4631.338105] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179 - Link status is: 1 kern :info : [ 4642.218100] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179 - Link status is: 0 kern :info : [ 4643.882038] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179 - Link status is: 1 kern :info : [ 4667.562011] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179 - Link status is: 0 ... kern :info : [ 9149.215456] [drm] ring test on 1 succeeded in 1 usecs kern :info : [ 9149.215459] [drm] ring test on 2 succeeded in 1 usecs kern :info : [ 9149.215466] [drm] ring test on 3 succeeded in 4 usecs and if I'm reading the code correctly, the former is even an example of something that happens in irq context. Rasmus
Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers
On 25/09/2020 10.28, Petr Mladek wrote: > On Thu 2020-09-24 14:32:49, Rasmus Villemoes wrote: >> On 24/09/2020 11.54, Rasmus Villemoes wrote: >>> On 23/09/2020 17.11, Petr Mladek wrote: >>>> On Tue 2020-09-22 17:44:14, John Ogness wrote: >>>>> vprintk_store() is using a single static buffer as a temporary >>>>> sprint buffer for the message text. This will not work once >>>>> @logbuf_lock is removed. Replace the single static buffer with a >>>>> pool of buffers. >>>> >>>> The buffer is used because we do not know the length of the >>>> formatted message to reserve the right space in the ring buffer >>>> in advance. >>>> >>>> There was the idea to call vsprintf(NULL, fmt, args) to count >>>> the length in advance. >>> >>> sprintf is dog slow. > > Well, printk() is a slow path. It has never been too optimized for > speed. The main purpose is to report problems and eventually some > interesting information. True. But remember that printk is called from _everywhere_, with all sorts of locks held and/or preemption disabled or whatnot, and every cycle spent in printk makes those windows wider. Doubling the cost of every single printk by unconditionally doing vsnprintf() twice is a bad idea. Rasmus
Re: [PATCH v1 0/6] seccomp: Implement constant action bitmaps
On 24/09/2020 15.58, YiFei Zhu wrote: > On Thu, Sep 24, 2020 at 8:46 AM Rasmus Villemoes > wrote: >> But one thing I'm wondering about and I haven't seen addressed anywhere: >> Why build the bitmap on the kernel side (with all the complexity of >> having to emulate the filter for all syscalls)? Why can't userspace just >> hand the kernel "here's a new filter: the syscalls in this bitmap are >> always allowed noquestionsasked, for the rest, run this bpf". Sure, that >> might require a new syscall or extending seccomp(2) somewhat, but isn't >> that a _lot_ simpler? It would probably also mean that the bpf we do get >> handed is a lot smaller. Userspace might need to pass a couple of >> bitmaps, one for each relevant arch, but you get the overall idea. > > Perhaps. The thing is, the current API expects any filter attaches to > be "additive". If a new filter gets attached that says "disallow read" > then no matter whatever has been attached already, "read" shall not be > allowed at the next syscall, bypassing all previous allowlist bitmaps > (so you need to emulate the bpf anyways here?). We should also not > have a API that could let anyone escape the secomp jail. Say "prctl" > is permitted but "read" is not permitted, one must not be allowed to > attach a bitmap so that "read" now appears in the allowlist. The only > way this could potentially work is to attach a BPF filter and a bitmap > at the same time in the same syscall, which might mean API redesign? Yes, the man page would read something like SECCOMP_SET_MODE_FILTER_BITMAP The system calls allowed are defined by a pointer to a Berkeley Packet Filter (BPF) passed via args. This argument is a pointer to a struct sock_fprog_bitmap; with that struct containing whatever information/extra pointers needed for passing the bitmap(s) in addition to the bpf prog. And SECCOMP_SET_MODE_FILTER would internally just be updated to work as-if all-zero allow-bitmaps were passed along. The internal kernel bitmap would just be the and of the bitmaps in the filter stack. Sure, it's UAPI, so would certainly need more careful thought on details of just how the arg struct looks like etc. etc., but I was wondering why it hadn't been discussed at all. >> I'm also a bit worried about the performance of doing that emulation; >> that's constant extra overhead for, say, launching a docker container. > > IMO, launching a docker container is so expensive this should be negligible. Regardless, I'd like to see some numbers, certainly for the "how much faster does a getpid() or read() or any of the other syscalls that nobody disallows" get, but also "what's the cost of doing that emulation at seccomp(2) time". Rasmus
Re: [PATCH v1 0/6] seccomp: Implement constant action bitmaps
On 24/09/2020 01.29, Kees Cook wrote: > rfc: > https://lore.kernel.org/lkml/20200616074934.1600036-1-keesc...@chromium.org/ > alternative: > https://lore.kernel.org/containers/cover.1600661418.git.yifei...@illinois.edu/ > v1: > - rebase to for-next/seccomp > - finish X86_X32 support for both pinning and bitmaps > - replace TLB magic with Jann's emulator > - add JSET insn > > TODO: > - add ALU|AND insn > - significantly more testing > > Hi, > > This is a refresh of my earlier constant action bitmap series. It looks > like the RFC was missed on the container list, so I've CCed it now. :) > I'd like to work from this series, as it handles the multi-architecture > stuff. So, I agree with Jann's point that the only thing that matters is that always-allowed syscalls are indeed allowed fast. But one thing I'm wondering about and I haven't seen addressed anywhere: Why build the bitmap on the kernel side (with all the complexity of having to emulate the filter for all syscalls)? Why can't userspace just hand the kernel "here's a new filter: the syscalls in this bitmap are always allowed noquestionsasked, for the rest, run this bpf". Sure, that might require a new syscall or extending seccomp(2) somewhat, but isn't that a _lot_ simpler? It would probably also mean that the bpf we do get handed is a lot smaller. Userspace might need to pass a couple of bitmaps, one for each relevant arch, but you get the overall idea. I'm also a bit worried about the performance of doing that emulation; that's constant extra overhead for, say, launching a docker container. Regardless of how the kernel's bitmap gets created, something like + if (nr < NR_syscalls) { + if (test_bit(nr, bitmaps->allow)) { + *filter_ret = SECCOMP_RET_ALLOW; + return true; + } probably wants some nospec protection somewhere to avoid the irony of seccomp() being used actively by bad guys. Rasmus
Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers
On 24/09/2020 11.54, Rasmus Villemoes wrote: > On 23/09/2020 17.11, Petr Mladek wrote: >> On Tue 2020-09-22 17:44:14, John Ogness wrote: >>> vprintk_store() is using a single static buffer as a temporary >>> sprint buffer for the message text. This will not work once >>> @logbuf_lock is removed. Replace the single static buffer with a >>> pool of buffers. >> >> The buffer is used because we do not know the length of the >> formatted message to reserve the right space in the ring buffer >> in advance. >> >> There was the idea to call vsprintf(NULL, fmt, args) to count >> the length in advance. > > sprintf is dog slow. If you do this, perhaps say "we can afford to use > 128 bytes of stack" and do vsprintf(stackbuf, 128, fmt, args) to do the > counting, and in the vast majority of cases where the text fits we don't > need to do vsprintf() again. Or, since 128 bytes of stack may be too much, combine the "pre-allocate a few buffers" with "but fall back to vsprintf(NULL) if we can't get one". That should allow choosing the X in X*1024 smaller than what worst-case requires (which will never actually be used, assuming the machine is doing something useful rather than just printk'ing all day long) and still works even when a tmp buffer can't be obtained. Rasmus
Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers
On 23/09/2020 17.11, Petr Mladek wrote: > On Tue 2020-09-22 17:44:14, John Ogness wrote: >> vprintk_store() is using a single static buffer as a temporary >> sprint buffer for the message text. This will not work once >> @logbuf_lock is removed. Replace the single static buffer with a >> pool of buffers. > > The buffer is used because we do not know the length of the > formatted message to reserve the right space in the ring buffer > in advance. > > There was the idea to call vsprintf(NULL, fmt, args) to count > the length in advance. sprintf is dog slow. If you do this, perhaps say "we can afford to use 128 bytes of stack" and do vsprintf(stackbuf, 128, fmt, args) to do the counting, and in the vast majority of cases where the text fits we don't need to do vsprintf() again. Rasmus
Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers
On 24/09/2020 10.53, Sergey Senozhatsky wrote: > On (20/09/24 10:45), Petr Mladek wrote: >> On Thu 2020-09-24 14:40:58, Sergey Senozhatsky wrote: >>> On (20/09/23 17:11), Petr Mladek wrote: AFAIK, there is one catch. We need to use va_copy() around the 1st call because va_format can be proceed only once. >>> >>> Current printk() should be good enough for reporting, say, "Kernel >>> stack overflow" errors. Is extra pressure that va_copy() adds something >>> that we need to consider? >> >> The thing is that vsprintf() traverses the arguments using va_arg(). >> It modifies internal values so that the next va_arg() will read >> the next value. > > Yes, I understand the purpose of va_copy(). I'm asking if we are > always on the safe side doing va_copy for every printk (+ potential > recursive va_copy-s). va_copy doesn't really add any extra stack use worth talking about. When ABI says all arguments are passed on stack, va_list is just a pointer into the arguments, and va_copy merely copies that pointer. When some arguments are passed in register, the function prologue sets up a "register save area" where those registers are stashed, and va_list then contains two pointers: one to the reg save area, one to the place in the stack where the rest of the arguments are, plus a bit of control information on how many of the register arguments have been consumed so far (and that control info is the only reason one must "reset" a va_list, or rather use a copy that was made before consumption started). In either case, an extra va_list doesn't cost more than 24 bytes or so of stack - even if printk() was called with 100 arguments, those 90-100ish arguments are only stored once on the stack. Rasmus
Re: WARNING in ex_handler_uaccess
On 19/09/2020 02.17, Al Viro wrote: > On Fri, Sep 18, 2020 at 05:07:43PM -0700, Andy Lutomirski wrote: >> On Fri, Sep 18, 2020 at 4:55 PM Al Viro wrote: >>> >>> On Fri, Sep 18, 2020 at 04:31:33PM -0700, Andy Lutomirski wrote: >>> check_zeroed_user() looks buggy. It does: if (!user_access_begin(from, size)) return -EFAULT; unsafe_get_user(val, (unsigned long __user *) from, err_fault); This is wrong if size < sizeof(unsigned long) -- you read outside the area you verified using user_access_begin(). >>> >>> Read the code immediately prior to that. from will be word-aligned, >>> and size will be extended accordingly. If the area acceptable for >>> user_access_begin() ends *NOT* on a word boundary, you have a problem >>> and I would strongly recommend to seek professional help. >>> >>> All reads in that thing are word-aligned and word-sized. So I very >>> much doubt that your analysis is correct. >> >> Maybe -ETOOTIRED, but I seriously question the math in here. Suppose >> from == (unsigned long *)1 and size == 1. Then align is 1, and we do: >> >> from -= align; >> size += align; >> >> So now from = 0 and size = 2. Now we do user_access_begin(0, 2) and >> then immediately read 4 or 8 bytes. No good. > > Could you explain what kind of insane hardware manages to do #PF-related > checks (including SMAP, whatever) with *sub*WORD* granularity? > > If it's OK with 16bit read from word-aligned address, but barfs on 64bit > one... I want to know what the hell had its authors been smoking. > So, not sure how the above got triggered, but I notice there might be an edge case in check_zeroed_user(): from -= align; size += align; if (!user_read_access_begin(from, size)) return -EFAULT; unsafe_get_user(val, (unsigned long __user *) from, err_fault); Suppose size is (size_t)-3 and align is 3. What's the convention for access_ok(whatever, 0)? Is that equivalent to access_ok(whatever, 1), or is it always true (or $ARCH-dependent)? But, AFAICT, no current caller of check_zeroed_user can end up passing in a size that can overflow to 0. E.g. for the case at hand, size cannot be more than SIZE_MAX-24. Rasmus
Re: [PATCH printk 2/3] printk: move dictionary keys to dev_printk_info
On 18/09/2020 14.13, Petr Mladek wrote: > On Fri 2020-09-18 08:16:37, Rasmus Villemoes wrote: >> On 17/09/2020 15.16, John Ogness wrote: >> >>> if (dev->class) >>> subsys = dev->class->name; >>> else if (dev->bus) >>> subsys = dev->bus->name; >>> else >>> - return 0; >>> + return; >>> >>> - pos += snprintf(hdr + pos, hdrlen - pos, "SUBSYSTEM=%s", subsys); >>> - if (pos >= hdrlen) >>> - goto overflow; >>> + snprintf(dev_info->subsystem, sizeof(dev_info->subsystem), subsys); >> >> It's unlikely that subsys would contain a %, but this will be yet >> another place to spend brain cycles ignoring if doing static analysis. >> So can we not do this. Either of strXcpy() for X=s,l will do the same >> thing, and likely faster. > > Good point! Better be on the safe size in a generic printk() API. > > Well, I am afraid that this would be only small drop in a huge lake. > class->name and bus->name seems to be passed to %s in so many > *print*() calls all over the kernel code. So what? printf("%s", some_string_that_might_contain_percent_chars) is not a problem. printf(some_string_that_might_contain_percent_chars) is. And yes, one could do snprintf(dev_info->subsystem, sizeof(dev_info->subsystem), "%s", subsys); but one might as well avoid the snprintf overhead and use one of the strX functions that have the exact same semantics (copy as much as there's room for, ensure nul-termination). Rasmus
Re: [PATCH printk 2/3] printk: move dictionary keys to dev_printk_info
On 17/09/2020 15.16, John Ogness wrote: > if (dev->class) > subsys = dev->class->name; > else if (dev->bus) > subsys = dev->bus->name; > else > - return 0; > + return; > > - pos += snprintf(hdr + pos, hdrlen - pos, "SUBSYSTEM=%s", subsys); > - if (pos >= hdrlen) > - goto overflow; > + snprintf(dev_info->subsystem, sizeof(dev_info->subsystem), subsys); It's unlikely that subsys would contain a %, but this will be yet another place to spend brain cycles ignoring if doing static analysis. So can we not do this. Either of strXcpy() for X=s,l will do the same thing, and likely faster. Rasmus
Re: [PATCH v2] scripts/setlocalversion: make git describe output more reliable
On 17/09/2020 14.22, Nico Schottelius wrote: > > Thanks for the patch Rasmus. Overall it looks good to me, be aligned to > the stable patch submission rules makes sense. A tiny thing though: > > I did not calculate the exact collision probability with 12 characters For reference, the math is something like this: Consider a repo with N+1 objects. We look at one specific object (for setlocalversion its the head commit being built, for the stable rules its whatever particular commit one is interested in for backporting), and want to know the probability that its sha1 collides with some other object in the first b bits (here b=48). Assuming the sha1s are independent and uniformly distributed, the probability of not colliding with one specific other commit is x=1-1/2^b, and the probability of not colliding with any of the other N commits is x^N, making the probability of a collision 1-x^N = (1-x)(1+x+x^2+...+x^{N-1}). Now the N terms in the second factor are very-close-to-but-slightly-smaller-than 1, so an upper bound for this probability is (1-x)N = N/2^b, which is also what one would naively expect. [This estimate is always valid, but it becomes a void statement of "the probability is less then 1" when N is >= 2^b]. So, assuming some vendor kernel repo that has all of Greg's stable.git (around 10M objects I think) and another 10M objects because random vendor, that works out to 20e6/2^48 = 7.1e-8, 71 ppb. > So I suggest you introduce something on the line of: > > ... > num_chars=12 > ... > --abbrev=$num_chars I considered that, but it becomes quite ugly since it needs to get into the awk script (as a 13, though perhaps we could get awk to do the +1, I don't really speak awk), where we'd then need to use " instead of ' and then escape the $ that are to be interpreted by awk and not the shell. So I think it's more readable with hardcoding and comments explaining why they are there; should anyone ever want to change 12. Rasmus
[PATCH v2] scripts/setlocalversion: make git describe output more reliable
When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, make sure the abbreviated sha1 always consists of exactly 12 hex characters. That is consistent with the current rule for -stable patches, and is almost always enough to identify the head commit unambigously - in the few cases where it does not, the v5.4.3-00021- prefix would certainly nail it down. Signed-off-by: Rasmus Villemoes --- v2: use 12 instead of 15, and ensure that the result does have exactly 12 hex chars. scripts/setlocalversion | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/scripts/setlocalversion b/scripts/setlocalversion index 20f2efd57b11..bb709eda96cd 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -45,7 +45,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then + head=$(git rev-parse --verify HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -59,11 +59,22 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="$(git describe 2>/dev/null)"; then - echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' - - # If we don't have a tag at all we print -g{commitish}. + # + # Ensure the abbreviated sha1 has exactly 12 + # hex characters, to make the output + # independent of git version, local + # core.abbrev settings and/or total number of + # objects in the current repository - passing + # --abbrev=12 ensures a minimum of 12, and the + # awk substr() then picks the 'g' and first 12 + # hex chars. + if atag="$(git describe --abbrev=12 2>/dev/null)"; then + echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}' + + # If we don't have a tag at all we print -g{commitish}, + # again using exactly 12 hex chars. else + head="$(echo $head | cut -c1-12)" printf '%s%s' -g $head fi fi -- 2.23.0
Re: [PATCH] scripts/setlocalversion: make git describe output more reliable
On 16/09/2020 16.28, Masahiro Yamada wrote: > On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes > wrote: >> >> On 10/09/2020 21.05, Brian Norris wrote: >>> On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada >>> wrote: >>>> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes >>>> wrote: >>>>> So in order to avoid `uname -a` output relying on such random details >>>>> of the build environment which are rather hard to ensure are >>>>> consistent between developers and buildbots, use an explicit >>>>> --abbrev=15 option (and for consistency, also use rev-parse --short=15 >>>>> for the unlikely case of no signed tags being usable). >>> >>> For the patch: >>> >>> Reviewed-by: Brian Norris >>> >>>> I agree that any randomness should be avoided. >>>> >>>> My question is, do we need 15-digits? >>> ... >>>> So, I think the conflict happens >>>> only when we have two commits that start with the same 7-digits >>>> in the _same_ release. Is this correct? >> >> No. >> >>> For git-describe (the case where we have a tag to base off): >>> "use digits, or as many digits as needed to form a unique object name" >> >> Yes, the abbreviated hash that `git describe` produces is unique among >> all objects (and objects are more than just commits) in the current >> repo, so what matters for probability-of-collision is the total number >> of objects - linus.git itself probably grows ~6 objects per release. >> >> As for "do we need 15 digits", well, in theory the next time I merge the >> next rt-stable tag into our kernel I could end up with a commit that >> matches some existing object in the first 33 hex chars at which point I >> should have chosen 34 - but of course that's so unlikely that it's >> irrelevant. >> >> But the upshot of that is that there really is no objective answer to >> "how many digits do we need", so it becomes a tradeoff between "low >> enough probability that anyone anywhere in the next few years would have >> needed more than X when building their own kernel" and readability of >> `uname -r` etc. So I decided somewhat arbitrarily that each time one >> rolls a new release, there should be less than 1e-9 probability that >> HEAD collides with some other object when abbreviated to X hexchars. >> X=12 doesn't pass that criteria even when the repo has only 10M objects >> (and, current linus.git already has objects that need 12 to be unique, >> so such collisions do happen in practice, though of course very rarely). >> 13 and 14 are just weird numbers, so I ended with 15, which also allows >> many many more objects in the repo before the probability crosses that >> 1e-9 threshold. >> > > > This is because you use the output from git as-is. > > Why are you still trying to rely on such obscure behavior of git? > > > It is pretty easy to get the fixed number of digits reliably. > > For example, > git rev-parse --verify HEAD 2>/dev/null | cut -c1-7 > > > It always returns a 7-digits hash, > and two different people will get the same result for sure. > > Your solution, --short=15, means "at least 15", > which still contains ambiguity. > > > > As I already noted, the kernelrelease string is > constructed in this format: > > -- > > > For example, if I enable CONFIG_LOCALVERSION_AUTO=y > in today's Linus tree, I got this: > > 5.9.0-rc5-5-gfc4f28bb3daf > > > What if the number of digits were 7? > > 5.9.0-rc5-5-gfc4f28b > > I see no problem here. The problem is that currently, the build relies on things which cannot easily be controlled or reproduced between different developers: Apart from git version (which is reasonable to mandate is the same, just like one must use same compiler, binutils etc. to get binary reproducible output), it depends on whether ~/.gitconfig has a core.abbrev setting - and even worse, it depends on the _total number of objects in the source git repo_, i.e. something that has nothing to do with what is currently in the work tree at all. And that leads to real bugs when one builds external modules that end up in one directory in the rootfs, while `uname -r` tells modprobe to look in some other directory (differing in the length of the abbreviated hash). > Even if we have another object that starts with "fc4f28b", > the kernelrelease string is still unique thanks to the > - prefix. > > Why do we care about the uniq
Re: [PATCH] scripts/setlocalversion: make git describe output more reliable
On 16/09/2020 20.01, Masahiro Yamada wrote: > On Thu, Sep 17, 2020 at 12:23 AM Rasmus Villemoes > wrote: >> >> On 16/09/2020 16.28, Masahiro Yamada wrote: >>> On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes >>> wrote: >>>> >>> Why do we care about the uniqueness of the abbreviated >>> hash in the whole git history? >> >> Because when I ask a customer "what kernel are you running", and they >> tell me "4.19.45-rt67-00567-123abc8", I prefer that 123abc8 uniquely >> identifies the right commit, instead of me having to dig around each of >> the commits starting with that prefix and see which one of them matches >> "is exactly 567 commits from 4.19.45-rt67". 7 hexchars is nowhere near >> enough for that today, which is why Linus himself did that "auto-tune >> abbrev based on repo size" patch for git.git. > > I like: > >git rev-parse --verify HEAD 2>/dev/null | cut -c1-15 > > better than: > >git rev-parse --verify --short=15 HEAD 2>/dev/null > > The former produces the deterministic kernelrelease string. > > > But, let's reconsider if we need as long as 15-digits. > > There are a couple of kinds of objects in git: commit, tree, blob. > > I think you came up with 15-digits to ensure the uniqueness > in _all_ kinds of objects. > > But, when your customer says "4.19.45-rt67-00567-123abc8", > 123abc8 is apparently a commit instead of a tree or a blob. > > > > In the context of the kernel version, we can exclude > tree and blob objects. > > In other words, I think "grows ~6 objects per release" > is somewhat over-estimating. > > We have ~15000 commits per release. So, the difference > is just 4x factor, though... > > > > BTW, I did some experiments, and I noticed > "git log" accepts a shorter hash > than "git show" does presumably because > "git log" only takes a commit. > > > > For example, "git show 06a0d" fails, but > "git log 06a0d" works. > > > masahiro@oscar:~/ref/linux$ git show 06a0d > error: short SHA1 06a0d is ambiguous > hint: The candidates are: > hint: 06a0df4d1b8b commit 2020-09-08 - fbcon: remove now unusued > 'softback_lines' cursor() argument > hint: 06a0d81911b3 tree > hint: 06a0dc5a84d2 tree > hint: 06a0d1947c77 blob > hint: 06a0df434249 blob > fatal: ambiguous argument '06a0d': unknown revision or path not in the > working tree. > Use '--' to separate paths from revisions, like this: > 'git [...] -- [...]' > masahiro@oscar:~/ref/linux$ git log --oneline -1 06a0d > 06a0df4d1b8b fbcon: remove now unusued 'softback_lines' cursor() argument > > > > > What is interesting is, if you prepend -- > to the abbreviated hash, "git show" accepts as short as a commit > "git log" does. > > masahiro@oscar:~/ref/linux$ git show v5.9-rc5-2-g06a0d | head -1 > commit 06a0df4d1b8b13b551668e47b11fd7629033b7df > > > I guess git cleverly understands it refers to a commit object > since "v5.9-rc5-2-" prefix only makes sense > in the commit context. > Well, yes, but unfortunately only so far as applying the same "the user means a commit object" logic; git doesn't do anything to actually use the prefix to disambiguate. In fact, looking at a semi-random commit in 4.19-stable v4.19.114-7-g236c445eb352: $ git show 236c44 error: short SHA1 236c44 is ambiguous hint: The candidates are: hint: 236c445eb352 commit 2020-03-13 - drm/bochs: downgrade pci_request_region failure from error to warning hint: 236c448cb6e7 commit 2011-09-08 - usbmon vs. tcpdump: fix dropped packet count hint: 236c445b1930 blob Now you're right that we get $ git show v4.19.114-7-g236c445 | head -n1 commit 236c445eb3529aa7c976f8812513c3cb26d77e27 so it knows we're not looking at that blob. But "git show v4.19.114-7-g236c44" still fails because there are two commits. Adding to the fun is that "git show v4.19.114-7-g236c448" actually shows the usbmon commit, which is old v3.2 stuff and certainly not a descendant of v4.19.114. I didn't actually know that git would even accept those prefixed strings as possible refspecs, and for a moment you had me hoping that git would actually do the disambiguation using that prefix, which would certainly make 7 hex chars enough; unfortunately that's not the case. > Keeping those above in mind, I believe 15-digits is too long. Well, as you indicated, commits are probably ~1/4 of all objects, i.e. half a hexchar worth of bits. So the commit/ob
Re: [PATCH] scripts/setlocalversion: make git describe output more reliable
On 10/09/2020 16.34, Masahiro Yamada wrote: > On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes > wrote: >> ... >> >> So in order to avoid `uname -a` output relying on such random details >> of the build environment which are rather hard to ensure are >> consistent between developers and buildbots, use an explicit >> --abbrev=15 option (and for consistency, also use rev-parse --short=15 >> for the unlikely case of no signed tags being usable). >> Hi Masahiro Can I get you to carry this through the kbuild tree? Unless of course your questions/concerns haven't been addressed. Thanks, Rasmus
Re: [PATCH] scripts/setlocalversion: make git describe output more reliable
On 10/09/2020 21.05, Brian Norris wrote: > On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada wrote: >> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes >> wrote: >>> So in order to avoid `uname -a` output relying on such random details >>> of the build environment which are rather hard to ensure are >>> consistent between developers and buildbots, use an explicit >>> --abbrev=15 option (and for consistency, also use rev-parse --short=15 >>> for the unlikely case of no signed tags being usable). > > For the patch: > > Reviewed-by: Brian Norris > >> I agree that any randomness should be avoided. >> >> My question is, do we need 15-digits? > ... >> So, I think the conflict happens >> only when we have two commits that start with the same 7-digits >> in the _same_ release. Is this correct? No. > For git-describe (the case where we have a tag to base off): > "use digits, or as many digits as needed to form a unique object name" Yes, the abbreviated hash that `git describe` produces is unique among all objects (and objects are more than just commits) in the current repo, so what matters for probability-of-collision is the total number of objects - linus.git itself probably grows ~6 objects per release. As for "do we need 15 digits", well, in theory the next time I merge the next rt-stable tag into our kernel I could end up with a commit that matches some existing object in the first 33 hex chars at which point I should have chosen 34 - but of course that's so unlikely that it's irrelevant. But the upshot of that is that there really is no objective answer to "how many digits do we need", so it becomes a tradeoff between "low enough probability that anyone anywhere in the next few years would have needed more than X when building their own kernel" and readability of `uname -r` etc. So I decided somewhat arbitrarily that each time one rolls a new release, there should be less than 1e-9 probability that HEAD collides with some other object when abbreviated to X hexchars. X=12 doesn't pass that criteria even when the repo has only 10M objects (and, current linus.git already has objects that need 12 to be unique, so such collisions do happen in practice, though of course very rarely). 13 and 14 are just weird numbers, so I ended with 15, which also allows many many more objects in the repo before the probability crosses that 1e-9 threshold. Rasmus Side note 1: Note that there really isn't any such thing as "last tag/previous tag" in a DAG; I think what git does is a best-effort search for the eligible tag that minimizes #({objects reachable from commit-to-be-described} - {objects reachable from tag}), where eligible tag means at least reachable from commit-to-be-described (so that set difference is a proper one), but there can be additional constraints (e.g. --match=...). Side note 2: Linus or Greg releases are actually not interesting here (see the logic in setlocalversion that stops early if we're exactly at an annotated tag) - the whole raison d'etre for setlocalversion is that people do maintain and build non-mainline kernels, and it's extremely useful for `uname -a` to accurately tell just which commit is running on the target.
[PATCH] scripts/setlocalversion: make git describe output more reliable
When building for an embedded target using Yocto, we're sometimes observing that the version string that gets built into vmlinux (and thus what uname -a reports) differs from the path under /lib/modules/ where modules get installed in the rootfs, but only in the length of the -gabc123def suffix. Hence modprobe always fails. The problem is that Yocto has the concept of "sstate" (shared state), which allows different developers/buildbots/etc. to share build artifacts, based on a hash of all the metadata that went into building that artifact - and that metadata includes all dependencies (e.g. the compiler used etc.). That normally works quite well; usually a clean build (without using any sstate cache) done by one developer ends up being binary identical to a build done on another host. However, one thing that can cause two developers to end up with different builds [and thus make one's vmlinux package incompatible with the other's kernel-dev package], which is not captured by the metadata hashing, is this `git describe`: The output of that can be affected by (1) git version: before 2.11 git defaulted to a minimum of 7, since 2.11 (git.git commit e6c587) the default is dynamic based on the number of objects in the repo (2) hence even if both run the same git version, the output can differ based on how many remotes are being tracked (or just lots of local development branches or plain old garbage) (3) and of course somebody could have a core.abbrev config setting in ~/.gitconfig So in order to avoid `uname -a` output relying on such random details of the build environment which are rather hard to ensure are consistent between developers and buildbots, use an explicit --abbrev=15 option (and for consistency, also use rev-parse --short=15 for the unlikely case of no signed tags being usable). Now, why is 60 bits enough for everyone? It's not mathematically guaranteed that git won't have to use 16 in some git repo, but it is beyond unlikely: Even in a repo with 100M objects, the probability that any given commit (i.e. the one being described) clashes with some other object in the first 15 hex chars is less than 1e-10, and currently a git repo tracking Linus', -stable and -rt only has around 10M objects. Signed-off-by: Rasmus Villemoes --- I could probably fix things by adding a 'git config --local core.abbrev 15' step to the Yocto build process after the repo to build from has been cloned but before building has started. But in the interest of binary reproducibility outside of just Yocto builds, I think it's better if this lives in the kernel. scripts/setlocalversion | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/setlocalversion b/scripts/setlocalversion index 20f2efd57b11..c5262f0d953d 100755 --- a/scripts/setlocalversion +++ b/scripts/setlocalversion @@ -45,7 +45,7 @@ scm_version() # Check for git and a git repo. if test -z "$(git rev-parse --show-cdup 2>/dev/null)" && - head=$(git rev-parse --verify --short HEAD 2>/dev/null); then + head=$(git rev-parse --verify --short=15 HEAD 2>/dev/null); then # If we are at a tagged commit (like "v2.6.30-rc6"), we ignore # it, because this version is defined in the top level Makefile. @@ -59,7 +59,7 @@ scm_version() fi # If we are past a tagged commit (like # "v2.6.30-rc5-302-g72357d5"), we pretty print it. - if atag="$(git describe 2>/dev/null)"; then + if atag="$(git describe --abbrev=15 2>/dev/null)"; then echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}' # If we don't have a tag at all we print -g{commitish}. -- 2.23.0
Re: [PATCH] /dev/zero: also implement ->read
On 07/09/2020 08.20, Christoph Hellwig wrote: > On Mon, Sep 07, 2020 at 12:34:37AM +0200, Rasmus Villemoes wrote: >> On 03/09/2020 17.59, Christoph Hellwig wrote: > >>> +static ssize_t read_zero(struct file *file, char __user *buf, >>> +size_t count, loff_t *ppos) >>> +{ >>> + size_t cleared = 0; >>> + >>> + while (count) { >>> + size_t chunk = min_t(size_t, count, PAGE_SIZE); >>> + >>> + if (clear_user(buf + cleared, chunk)) >>> + return cleared ? cleared : -EFAULT; >> >> Probably nobody really cares, but currently doing >> >> read(fd, &unmapped_page - 5, 123); >> >> returns 5, and those five bytes do get cleared; if I'm reading the above >> right you'd return -EFAULT for that case. >> >> >>> + cleared += chunk; >>> + count -= chunk; >>> + >>> + if (signal_pending(current)) >>> + return cleared ? cleared : -ERESTARTSYS; >> >> I can't see how we can get here without 'cleared' being positive, so >> this can just be 'return cleared' (and if you fix the above EFAULT case >> to more accurately track how much got cleared, there's probably no >> longer any code to be symmetric with anyway). > > Yeah, I'll fix these up and resend. > Actually, while you're micro-optimizing it, AFAIK VFS already handles count==0, so you can avoid the initial branch and the last cond_resched() by writing it something like while (1) { size_t chunk = min_t(size_t, count, PAGE_SIZE), c; c = chunk - clear_user(buf + cleared, chunk); if (unlikely(!c)) return cleared ?: -EFAULT; cleared += c; count -= c; if (!count || signal_pending()) return cleared; cond_resched(); } For the dd test case with the default bs=512 that avoids cond_resched() and signal_pending() altogether. Rasmus
Re: [PATCH] /dev/zero: also implement ->read
On 03/09/2020 17.59, Christoph Hellwig wrote: > Christophe reported a major speedup due to avoiding the iov_iter > overhead, so just add this trivial function. Note that /dev/zero > already implements both an iter and non-iter writes so this just > makes it more symmetric. > > Christophe Leroy ?-by ? > Signed-off-by: Christoph Hellwig > --- > drivers/char/mem.c | 22 ++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c > index abd4ffdc8cdebc..1dc99ab158457a 100644 > --- a/drivers/char/mem.c > +++ b/drivers/char/mem.c > @@ -726,6 +726,27 @@ static ssize_t read_iter_zero(struct kiocb *iocb, struct > iov_iter *iter) > return written; > } > > +static ssize_t read_zero(struct file *file, char __user *buf, > + size_t count, loff_t *ppos) > +{ > + size_t cleared = 0; > + > + while (count) { > + size_t chunk = min_t(size_t, count, PAGE_SIZE); > + > + if (clear_user(buf + cleared, chunk)) > + return cleared ? cleared : -EFAULT; Probably nobody really cares, but currently doing read(fd, &unmapped_page - 5, 123); returns 5, and those five bytes do get cleared; if I'm reading the above right you'd return -EFAULT for that case. > + cleared += chunk; > + count -= chunk; > + > + if (signal_pending(current)) > + return cleared ? cleared : -ERESTARTSYS; I can't see how we can get here without 'cleared' being positive, so this can just be 'return cleared' (and if you fix the above EFAULT case to more accurately track how much got cleared, there's probably no longer any code to be symmetric with anyway). Rasmus
Re: [PATCH rdma-next 2/4] gcov: Use proper duplication routine for const pointer
On 02/09/2020 10.55, Leon Romanovsky wrote: > From: Leon Romanovsky > > The filename is a const pointer, so use the proper string duplication > routine that takes into account const identifier. This commit log makes no sense at all. kstrdup_const is merely an optimization that can be used when there's a good chance that the passed string lives in vmlinux' .rodata, in which case it is known to be immortal, and we can avoid allocating heap memory to contain a duplicate. [It also requires that the caller has no intention of modifying the returned string.] In the case of something called ->filename, I assume it's initialized with __FILE__ somewhere, making the above true for built-in stuff but not for modules. So if the gcov_info can live longer than the module, it's of course necessary to duplicate the string, but OTOH making an optimization for the built-in stuff makes sense. So this is certainly one of the places where kstrdup_const() seems applicable. But it has nothing whatsoever to do with the C-level qualifiers the argument may have. Rasmus
Re: [PATCH 2/2] mtd: spi-nor: enable lock interface for macronix chips
On 12/08/2020 17.18, Ivan Mikhaylov wrote: > Add locks for whole macronix chip series with BP0-2 and BP0-3 bits. > > Tested with mx25l51245g(BP0-3). Hmm. I've tried adding support for locking on Macronix to U-Boot (https://patchwork.ozlabs.org/project/uboot/patch/20200326114257.1782-3-rasmus.villem...@prevas.dk/), but that was quite a bit more involved than this. Note in particular the first part of my commit message: Macronix chips implements locking in (power-of-two multiple of) 64K blocks, not as a fraction of the chip's size. At least, that was true for the chip I was interested in and the few others whose data sheets I grabbed to double-check. So I'm a bit skeptical that this can work out-of-the-box without introducing a new struct spi_nor_locking_ops. Rasmus
Re: [PATCH] linux/kernel.h: add container_from()
On 27/08/2020 03.36, Allen Pais wrote: > Introduce container_from() as a generic helper instead of > sub-systems defining a private from_* API > (Eg: from_tasklets recently introduced in > 12cc923f1ccc: Tasklet: Introduce new initialization API) > > The helper is similar to container_of() in argument order > with the difference of naming the containing structure instead ^^^ > of having to specify its type. > > +/** > + * container_from - cast a member of a structure out to the containing > structure > + * @ptr: the pointer to the member. > + * @container: the type of the container struct. ^ This seems to have been copy-pasted from container_of? Shouldn't @container be the (local) value we're storing into? As in foo = container_from(..., foo, ...)? Or am I misunderstanding the purpose of this? [And I think it would read nicer if the bikeshed was called to_container(), but don't care deeply.] Rasmus
Re: [PATCH] usb: atm: don't use snprintf() for sysfs attrs
On 27/08/2020 15.18, Alex Dewar wrote: > On Thu, Aug 27, 2020 at 09:15:37AM +0200, Greg Kroah-Hartman wrote: >> On Thu, Aug 27, 2020 at 08:42:06AM +0200, Rasmus Villemoes wrote: >>> On 25/08/2020 00.23, Alex Dewar wrote: >>>> kernel/cpu.c: don't use snprintf() for sysfs attrs >>>> >>>> As per the documentation (Documentation/filesystems/sysfs.rst), >>>> snprintf() should not be used for formatting values returned by sysfs. >>>> >>> >>> Can we have a sysfs_sprintf() (could just be a macro that does sprintf) >>> to make it clear to the next reader that we know we're in a sysfs show >>> method? It would make auditing uses of sprintf() much easier. >> >> Code churn to keep code checkers quiet for pointless reasons? What >> could go wrong with that... I did not (mean to) suggest replacing existing sprintf() calls in sysfs show methods. But when changes _are_ being made, such as when replacing snprintf() calls for whatever reasons, can we please not make it harder for people doing manual audits (those are "code checkers" as well, I suppose, but they do tend to only make noise when finding something). >> It should be pretty obvious to any reader that you are in a sysfs show >> method, as almost all of them are trivially tiny and obvious. git grep doesn't immediately show that, not even with a suitable -C argument, as you can't really know the potential callers unless you open the file and see that the function is only assigned as a .show method. And even that can be a pain because it's all hidden behind five levels of magic macros that build identifiers with ##. > Perhaps I should have mentioned this in the commit message, but the problem > is that snprintf() doesn't return the number of bytes written to the > destination buffer, I'm perfectly well aware of that, TYVM (you may want to 'git log --author Villemoes lib/vsprintf.c'). but the number of bytes that *would have been written if > they fitted*, which may be more than the bounds specified [1]. So "return > snprintf(...)" for sysfs attributes is an antipattern. If you need bounded > string ops, scnprintf() is the way to go. Using snprintf() can give a > false sense of security, because it isn't necessarily safe. Huh? This all seems utterly irrelevant WRT a change that replaces PAGE_SIZE by INT_MAX (because that's what sprintf() is going to pretend you passed). You get the same return value. But I'm not at all concerned about whether one passes the proper buffer size or not in sysfs show methods; with my embedded hat on, I'm all for saving a few bytes of .text here and there. The problem, as far as I'm concerned, is merely that adding sprintf() callers makes it harder to find the problematic sprintf() instances. >> Anyway, we've had this for 20 years, My bad, for a moment I forgot that code and patterns of that vintage cannot have bugs. Rasmus
Re: [PATCH] usb: atm: don't use snprintf() for sysfs attrs
On 25/08/2020 00.23, Alex Dewar wrote: > kernel/cpu.c: don't use snprintf() for sysfs attrs > > As per the documentation (Documentation/filesystems/sysfs.rst), > snprintf() should not be used for formatting values returned by sysfs. > Sure. But then the security guys come along and send a patch saying "sprintf is evil, always pass a buffer bound". Perhaps with a side of "this code could get copy-pasted, better not promote the use of sprintf more than strictly necessary". Can we have a sysfs_sprintf() (could just be a macro that does sprintf) to make it clear to the next reader that we know we're in a sysfs show method? It would make auditing uses of sprintf() much easier. > static ssize_t cxacru_sysfs_showattr_LINE(u32 value, char *buf) > @@ -275,8 +275,8 @@ static ssize_t cxacru_sysfs_showattr_LINE(u32 value, char > *buf) > "waiting", "initialising" > }; > if (unlikely(value >= ARRAY_SIZE(str))) > - return snprintf(buf, PAGE_SIZE, "%u\n", value); > - return snprintf(buf, PAGE_SIZE, "%s\n", str[value]); > + return sprintf(buf, "%u\n", value); > + return sprintf(buf, "%s\n", str[value]); > } Not this patch in particular, but in some cases the string being printed is not immediately adjacent to the sprintf call, making it rather hard to subsequently verify that yes, that string is certainly reasonably bounded. If you really want to save the few bytes of code that is the practical effect of eliding the PAGE_SIZE argument, how about a sysfs_print_string(buf, str) helper that prints the string and appends a newline; that's another argument saved. And again it would make it obvious to a reader that that particular helper is only to be used in sysfs show methods. Rasmus
Re: [PATCH] checkpatch: Allow not using -f with files that are in git
On 25/08/2020 02.09, Joe Perches wrote: > If a file exists in git and checkpatch is used without the -f > flag for scanning a file, then checkpatch will scan the file > assuming it's a patch and emit: > > ERROR: Does not appear to be a unified-diff format patch > > Change the behavior to assume the -f flag if the file exists > in git. Heh, I read the patch subject to mean you introduced a way for subsystem maintainers to prevent running checkpatch -f on their files, which I think some would like ;) > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 79fc357b18cd..cdee7cfadc11 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -976,6 +976,16 @@ sub seed_camelcase_includes { > } > } > > +sub git_is_single_file { > + my ($filename) = @_; > + > + return 0 if ((which("git") eq "") || !(-e "$gitroot")); > + > + my $output = `${git_command} ls-files -- $filename`; > + my $count = $output =~ tr/\n//; > + return $count eq 1 && $output =~ m{^${filename}$}; > +} Isn't that somewhat expensive to do for each file? Why not postpone that check till we're about to complain that the file is not a diff (haven't looked at how such a refactoring would look). Rasmus
Re: [PATCH] lib: Convert test_printf.c to KUnit
On 21/08/2020 13.37, Petr Mladek wrote: > On Mon 2020-08-17 09:06:32, Rasmus Villemoes wrote: >> On 17/08/2020 06.30, Arpitha Raghunandan wrote: >>> Converts test lib/test_printf.c to KUnit. >>> More information about KUnit can be found at >>> https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html. >>> KUnit provides a common framework for unit tests in the kernel. >> >> So I can continue to build a kernel with some appropriate CONFIG set to >> y, boot it under virt-me, run dmesg and see if I broke printf? That's >> what I do now, and I don't want to have to start using some enterprisy >> framework. > > I had the same concern. I have tried it. Thanks for doing that and reporting the results. > #> modprobe printf_kunit > > produced the following in dmesg: > > [ 60.931175] printf_kunit: module verification failed: signature and/or > required key missing - tainting kernel > [ 60.942209] TAP version 14 > [ 60.945197] # Subtest: printf-kunit-test > [ 60.945200] 1..1 > [ 60.951092] ok 1 - selftest > [ 60.953414] ok 1 - printf-kunit-test > > I could live with the above. Then I tried to break a test by the following > change: > > > diff --git a/lib/printf_kunit.c b/lib/printf_kunit.c > index 68ac5f9b8d28..1689dadd70a3 100644 > --- a/lib/printf_kunit.c > +++ b/lib/printf_kunit.c > @@ -395,7 +395,7 @@ ip4(struct kunit *kunittest) > sa.sin_port = cpu_to_be16(12345); > sa.sin_addr.s_addr = cpu_to_be32(0x7f01); > > - test(kunittest, "127.000.000.001|127.0.0.1", "%pi4|%pI4", > &sa.sin_addr, &sa.sin_addr); > + test(kunittest, "127-000.000.001|127.0.0.1", "%pi4|%pI4", > &sa.sin_addr, &sa.sin_addr); > test(kunittest, "127.000.000.001|127.0.0.1", "%piS|%pIS", &sa, &sa); > sa.sin_addr.s_addr = cpu_to_be32(0x01020304); > test(kunittest, "001.002.003.004:12345|1.2.3.4:12345", "%piSp|%pISp", > &sa, &sa); > > > It produced:: > > [ 56.786858] TAP version 14 > [ 56.787493] # Subtest: printf-kunit-test > [ 56.787494] 1..1 > [ 56.788612] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:76 >Expected memcmp(test_buffer, expect, written) == 0, but >memcmp(test_buffer, expect, written) == 1 >0 == 0 >vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote > '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' > [ 56.795433] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:76 >Expected memcmp(test_buffer, expect, written) == 0, but >memcmp(test_buffer, expect, written) == 1 >0 == 0 >vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote > '127.000.000.001|127', expected '127-000.000.001|127' > [ 56.800909] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:108 >Expected memcmp(p, expect, elen+1) == 0, but >memcmp(p, expect, elen+1) == 1 >0 == 0 >kvasprintf(..., "%pi4|%pI4", ...) returned > '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' > [ 56.806497] not ok 1 - selftest > [ 56.806497] not ok 1 - printf-kunit-test > > while the original code would have written the following error messages: > > [ 95.859225] test_printf: loaded. > [ 95.860031] test_printf: vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote > '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' > [ 95.862630] test_printf: vsnprintf(buf, 6, "%pi4|%pI4", ...) wrote > '127.0', expected '127-0' > [ 95.864118] test_printf: kvasprintf(..., "%pi4|%pI4", ...) returned > '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1' > [ 95.866589] test_printf: failed 3 out of 388 tests > > > Even the error output is acceptable for me. Urgh. Yeah, perhaps, but the original is much more readable; it really doesn't matter that a memcmp() fails to compare equal to 0, that's merely how we figured out that the output was wrong. I am just curious why > the 2nd failure is different: > >+ original code: vsnprintf(buf, 6, "%pi4|%pI4", ...) wrote '127.0', > expected '127-0' >+ kunit code: vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote > '127.000.000.001|127', expected
Re: [PATCH 0/4] -ffreestanding/-fno-builtin-* patches
On 20/08/2020 19.56, Arvind Sankar wrote: > On Thu, Aug 20, 2020 at 04:56:02PM +0200, Rasmus Villemoes wrote: >> On 18/08/2020 23.41, Arvind Sankar wrote: >>> >>> Note that -fno-builtin-foo seems to mean slightly different things in >>> clang and gcc. From experimentation, clang will neither optimize a call >>> to foo, nor perform an optimization that introduces a call to foo. gcc >>> will avoid optimizing calls to foo, but it can still generate new calls >>> to foo while optimizing something else. Which means that >>> -fno-builtin-{bcmp,stpcpy} only solves things for clang, not gcc. It's >>> just that gcc doesn't seem to have implemented those optimizations. >>> >> >> I think it's more than that. I've always read gcc's documentation >> >> '-fno-builtin' >> '-fno-builtin-FUNCTION' >> Don't recognize built-in functions that do not begin with >> '__builtin_' as prefix. ... >> >> GCC normally generates special code to handle certain built-in >> functions more efficiently; for instance, calls to 'alloca' may >> become single instructions which adjust the stack directly, and >> calls to 'memcpy' may become inline copy loops. >> ... >> >> to mean exactly that observed above and nothing more, i.e. that >> -fno-builtin-foo merely means that gcc stops treating a call of a >> function named foo to mean a call to a function implementing the >> standard function by that name (and hence allows it to e.g. replace a >> memcpy(d, s, 1) by byte load+store). It does not mean to prevent >> emitting calls to foo, and I don't think it ever will - it's a bit sad >> that clang has chosen to interpret these options differently. > > That documentation is misleading, as it also goes on to say: > "...nor can you change the behavior of the functions by linking with a > different library" > which implies that you _can_ change the behavior if you use the option, > and which is what your "i.e." is saying as well. > > My point is that this is not completely true: in gcc, foo by default is > defined to be __builtin_foo, and -fno-builtin-foo simply removes this > definition. So the effect is just that calls to foo in the original > source will be left alone. Yes, this is a much better way of putting it. And with -fbuiltin-foo in effect, the compiler just needs to transform the code in some way as-if the standard function by that name was called, which it can of course decide to implement by emitting such a call, but it can also open-code it - or synthesize it using other std functions. > But in order for an optimization that introduces a new call to foo to be > valid, foo _must_ have standard semantics: strchr(s,'\0') is not s + > strlen(s) unless strlen has standard semantics. Correct. So I agree that -fno-builtin-strlen should prevent the compiler from generating calls to strlen() that don't appear in the code. This is an oversight in > gcc's optimizations: it converts to s + __builtin_strlen(s), which then > (normally) becomes s + strlen(s). > > Check out this horror: https://godbolt.org/z/a1r9fK > > Clang will disable this optimization if -fno-builtin-strlen is > specified. > > Clang's interpretation is more useful for embedded, since you can use > -fno-builtin-foo and avoid calling __builtin_foo directly, and be > guaranteed that there will be no calls to foo that you didn't write > explicitly (outside of memcpy/memset/memcmp). In this case you are free > to implement foo with non-standard semantics, or avoid implementing it > altogether, and be reasonably confident that it will all work. Yeah, except that the list of -fno-builtin-foo one would have to pass is enourmous, so for targets with a somewhat wonky libc, I'd much rather be able to do a blanket -fno-builtin, and then manually check their memcpy, memset and memcmp implementations and opt back in with -fbuiltin-mem{cpy,set,cmp} so that small constant-size memcpys do get properly open-coded. The advice in gcc's documentation of just #definining memcpy() to __builtin_memcpy() doesn't work in the real world (for example it breaks C++ code that uses std::memcpy(...)). >> Thinking out load, it would be useful if both compilers grew >> >> -fassume-provided-std-foo >> >> and >> >> -fno-assume-provided-std-foo >> >> options to tell the compiler that a function named foo with standard >> semantics can be assumed (or not) to be provided by the execution >> environment; i.e. one half of what -f(no-)builtin-foo apparently does &
Re: [PATCH 0/4] -ffreestanding/-fno-builtin-* patches
On 18/08/2020 23.41, Arvind Sankar wrote: > On Tue, Aug 18, 2020 at 01:58:51PM -0700, Nick Desaulniers wrote: >> On Tue, Aug 18, 2020 at 1:27 PM Nick Desaulniers >> wrote: >>> >>> On Tue, Aug 18, 2020 at 1:24 PM Arvind Sankar wrote: On Tue, Aug 18, 2020 at 12:13:22PM -0700, Linus Torvalds wrote: > On Tue, Aug 18, 2020 at 12:03 PM H. Peter Anvin wrote: >> >> I'm not saying "change the semantics", nor am I saying that playing >> whack-a-mole *for a limited time* is unreasonable. But I would like to >> go back >> to the compiler authors and get them to implement such a #pragma: "this >> freestanding implementation *does* support *this specific library >> function*, >> and you are free to call it." > > I'd much rather just see the library functions as builtins that always > do the right thing (with the fallback being "just call the standard > function"). > > IOW, there's nothing wrong with -ffreestanding if you then also have > __builtin_memcpy() etc, and they do the sane compiler optimizations > for memcpy(). > > What we want to avoid is the compiler making *assumptions* based on > standard names, because we may implement some of those things > differently. > -ffreestanding as it stands today does have __builtin_memcpy and friends. But you need to then use #define memcpy __builtin_memcpy etc, which is messy and also doesn't fully express what you want. #pragma, or even just allowing -fbuiltin-foo options would be useful. >> >> I do really like the idea of -fbuiltin-foo. For example, you'd specify: >> >> -ffreestanding -fbuiltin-bcmp >> >> as an example. `-ffreestanding` would opt you out of ALL libcall >> optimizations, `-fbuiltin-bcmp` would then opt you back in to >> transforms that produce bcmp. That way you're informing the compiler >> more precisely about the environment you'd be targeting. It feels >> symmetric to existing `-fno-` flags (clang makes -f vs -fno- pretty >> easy when there is such symmetry). And it's already convention that >> if you specify multiple conflicting compiler flags, then the latter >> one specified "wins." In that sense, turning back on specific >> libcalls after disabling the rest looks more ergonomic to me. >> >> Maybe Eli or David have thoughts on why that may or may not be as >> ergonomic or possible to implement as I imagine? >> > > Note that -fno-builtin-foo seems to mean slightly different things in > clang and gcc. From experimentation, clang will neither optimize a call > to foo, nor perform an optimization that introduces a call to foo. gcc > will avoid optimizing calls to foo, but it can still generate new calls > to foo while optimizing something else. Which means that > -fno-builtin-{bcmp,stpcpy} only solves things for clang, not gcc. It's > just that gcc doesn't seem to have implemented those optimizations. > I think it's more than that. I've always read gcc's documentation '-fno-builtin' '-fno-builtin-FUNCTION' Don't recognize built-in functions that do not begin with '__builtin_' as prefix. ... GCC normally generates special code to handle certain built-in functions more efficiently; for instance, calls to 'alloca' may become single instructions which adjust the stack directly, and calls to 'memcpy' may become inline copy loops. ... to mean exactly that observed above and nothing more, i.e. that -fno-builtin-foo merely means that gcc stops treating a call of a function named foo to mean a call to a function implementing the standard function by that name (and hence allows it to e.g. replace a memcpy(d, s, 1) by byte load+store). It does not mean to prevent emitting calls to foo, and I don't think it ever will - it's a bit sad that clang has chosen to interpret these options differently. Thinking out load, it would be useful if both compilers grew -fassume-provided-std-foo and -fno-assume-provided-std-foo options to tell the compiler that a function named foo with standard semantics can be assumed (or not) to be provided by the execution environment; i.e. one half of what -f(no-)builtin-foo apparently does for clang currently. And yes, the positive -fbuiltin-foo would also be quite useful in order to get the compiler to recognize a few important functions (memcpy, memcmp) while using -ffreestanding (or just plain -fno-builtin) to tell it to avoid assuming anything about most std functions - I've worked on a VxWorks target where snprintf() didn't have the correct "return what would be written" semantics but rather behaved like the kernel's non-standard scnprintf(), and who knows what other odd quirks that libc had. Rasmus
Re: [PATCH v2] overflow: Add __must_check attribute to check_*() helpers
On 17/08/2020 11.08, David Sterba wrote: > On Sat, Aug 15, 2020 at 10:09:24AM -0700, Kees Cook wrote: >> >> +/* >> + * Allows for effectively applying __must_check to a macro so we can have >> + * both the type-agnostic benefits of the macros while also being able to >> + * enforce that the return value is, in fact, checked. >> + */ >> +static inline bool __must_check __must_check_overflow(bool overflow) >> +{ >> +return unlikely(overflow); > > How does the 'unlikely' hint propagate through return? It is in a static > inline so compiler has complete information in order to use it, but I'm > curious if it actually does. I wondered the same thing, but as I noted in a reply in the v1 thread, that pattern is used in kernel/sched/, and the scheduler is a far more critical path than anywhere these might be used, so if it's good enough for kernel/sched/, it should be good enough here. I have no idea how one could write a piece of non-trivial code to see if the hint actually makes a difference. > > In case the hint gets dropped, the fix would probably be > > #define check_add_overflow(a, b, d) unlikely(__must_check_overflow(({ \ > typeof(a) __a = (a);\ > typeof(b) __b = (b);\ > typeof(d) __d = (d);\ > (void) (&__a == &__b); \ > (void) (&__a == __d); \ > __builtin_add_overflow(__a, __b, __d); \ > }))) > Well, maybe, but I'd be a little worried that the !! that unlikely() slabs on its argument may count as a use of that argument, hence nullifying the __must_check which is the main point - the unlikely just being something we can add for free while touching this code. Haven't tested it, though. Rasmus
Re: [PATCH 08/30] net: wireless: ath: carl9170: Mark 'ar9170_qmap' as __maybe_unused
On 14/08/2020 17.14, Christian Lamparter wrote: > On 2020-08-14 13:39, Lee Jones wrote: >> 'ar9170_qmap' is used in some source files which include carl9170.h, >> but not all of them. Mark it as __maybe_unused to show that this is >> not only okay, it's expected. >> >> Fixes the following W=1 kernel build warning(s) > > Is this W=1 really a "must" requirement? I find it strange having > __maybe_unused in header files as this "suggests" that the > definition is redundant. In this case it seems one could replace the table lookup with a static inline u8 ar9170_qmap(u8 idx) { return 3 - idx; } gcc doesn't warn about unused static inline functions (or one would have a million warnings to deal with). Just my $0.02. Rasmus
Re: [PATCH] lib: Convert test_printf.c to KUnit
On 17/08/2020 06.30, Arpitha Raghunandan wrote: > Converts test lib/test_printf.c to KUnit. > More information about KUnit can be found at > https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html. > KUnit provides a common framework for unit tests in the kernel. So I can continue to build a kernel with some appropriate CONFIG set to y, boot it under virt-me, run dmesg and see if I broke printf? That's what I do now, and I don't want to have to start using some enterprisy framework. > diff --git a/lib/test_printf.c b/lib/printf_kunit.c > similarity index 45% > rename from lib/test_printf.c > rename to lib/printf_kunit.c > index 7ac87f18a10f..68ac5f9b8d28 100644 > --- a/lib/test_printf.c > +++ b/lib/printf_kunit.c > @@ -5,6 +5,7 @@ > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > +#include > #include > #include > #include > @@ -30,79 +31,61 @@ > #define PAD_SIZE 16 > #define FILL_CHAR '$' > > -static unsigned total_tests __initdata; > -static unsigned failed_tests __initdata; > -static char *test_buffer __initdata; > -static char *alloced_buffer __initdata; > +static char *test_buffer; > +static char *alloced_buffer; > > -static int __printf(4, 0) __init > -do_test(int bufsize, const char *expect, int elen, > +static void __printf(5, 0) > +do_test(struct kunit *kunittest, int bufsize, const char *expect, int elen, > const char *fmt, va_list ap) > { > va_list aq; > int ret, written; > > - total_tests++; > - > memset(alloced_buffer, FILL_CHAR, BUF_SIZE + 2*PAD_SIZE); > va_copy(aq, ap); > ret = vsnprintf(test_buffer, bufsize, fmt, aq); > va_end(aq); > > - if (ret != elen) { > - pr_warn("vsnprintf(buf, %d, \"%s\", ...) returned %d, expected > %d\n", > + KUNIT_EXPECT_EQ_MSG(kunittest, ret, elen, > + "vsnprintf(buf, %d, \"%s\", ...) returned %d, expected > %d\n", > bufsize, fmt, ret, elen); > - return 1; > - } IIRC, some of these early returns are required to ensure the following checks do not fail (as in, potentially crash the kernel) simply because they go off into the weeds. Please double-check that they are all safe to continue to perform (though, another reason I might have put them in is to simply avoid lots of useless collateral). > - if (memchr_inv(alloced_buffer, FILL_CHAR, PAD_SIZE)) { > + KUNIT_EXPECT_EQ_MSG(kunittest, memchr_inv(alloced_buffer, FILL_CHAR, > PAD_SIZE), NULL, > - if (memchr_inv(test_buffer, FILL_CHAR, BUF_SIZE + PAD_SIZE)) { > + KUNIT_EXPECT_FALSE_MSG(kunittest, > - if (memchr_inv(test_buffer + written + 1, FILL_CHAR, BUF_SIZE + > PAD_SIZE - (written + 1))) { > + KUNIT_EXPECT_FALSE_MSG(kunittest, > + memchr_inv(test_buffer + written + 1, FILL_CHAR, > BUF_SIZE + PAD_SIZE - (written + 1)) Why the inconsistency in what a memchr_inv != NULL check gets converted to? > > -static void __printf(3, 4) __init > -__test(const char *expect, int elen, const char *fmt, ...) > +static void __printf(4, 5) > +__test(struct kunit *kunittest, const char *expect, int elen, const char > *fmt, ...) > { > va_list ap; > int rand; > char *p; > > - if (elen >= BUF_SIZE) { > - pr_err("error in test suite: expected output length %d too > long. Format was '%s'.\n", > -elen, fmt); > - failed_tests++; > - return; > - } > + KUNIT_EXPECT_LT_MSG(kunittest, elen, BUF_SIZE, > + "error in test suite: expected output length %d too > long. Format was '%s'.\n", > + elen, fmt); And it's ok to continue with the tests when the test suite itself is buggy because? [*] > va_start(ap, fmt); > > @@ -112,49 +95,46 @@ __test(const char *expect, int elen, const char *fmt, > ...) >* enough and 0), and then we also test that kvasprintf would >* be able to print it as expected. >*/ > - failed_tests += do_test(BUF_SIZE, expect, elen, fmt, ap); > + do_test(kunittest, BUF_SIZE, expect, elen, fmt, ap); > rand = 1 + prandom_u32_max(elen+1); > /* Since elen < BUF_SIZE, we have 1 <= rand <= BUF_SIZE. */ > - failed_tests += do_test(rand, expect, elen, fmt, ap); [*] Certainly this invariant gets violated, so we (may) provide do_test with a buffer size larger than, well, BUF_SIZE. > > -#define test(expect, fmt, ...) \ > - __test(expect, strlen(expect), fmt, ##__VA_ARGS__) > +#define test(kunittest, expect, fmt, ...) > \ > + __test(kunittest, expect, strlen(expect), fmt, ##__VA_ARGS__) > > -static void __init > -test_basic(void) > +static void > +test_basic(struct kunit *kunittest) > { > /* Work around annoying "warning: zero-length gnu_printf format > string". */ > char nul = '\0'; > > - test("", &nul); > - test("100%"
Re: [PATCH] overflow: Add __must_check attribute to check_*() helpers
On 13/08/2020 13.23, Matthew Wilcox wrote: > On Wed, Aug 12, 2020 at 02:51:52PM -0700, Kees Cook wrote: >> +/* >> + * Allows to effectively us apply __must_check to a macro so we can have >> + * both the type-agnostic benefits of the macros while also being able to >> + * enforce that the return value is, in fact, checked. >> + */ >> +static inline bool __must_check __must_check_bool(bool condition) >> +{ >> +return unlikely(condition); >> +} > > I'm fine with the concept, but this is a weirdly-generically-named > function that has a very specific unlikely() in it. So I'd call > this __must_check_overflow() and then it's obvious that overflow is > unlikely(), whereas it's not obvious that __must_check_bool() is going > to be unlikely(). Incidentally, __must_check_overflow was what was actually Suggested-by me - though I didn't think too hard about that name, I certainly agree with your reasoning. I still don't know if (un)likely annotations actually matter when used this way, but at least the same pattern is used in kernel/sched/, so probably. Rasmus
Re: [PATCH RT 1/6] signal: Prevent double-free of user struct
On 13/08/2020 03.45, Steven Rostedt wrote: > 5.4.54-rt33-rc1 stable review patch. > If anyone has any objections, please let me know. > No objections, quite the contrary. I think this should also be applied to 4.19-rt: Commit fda31c50292a is also in 4.19.y (as 797479da0ae9), since 4.19.112 and hence also 4.19.112-rt47. For a while we've tried to track down a hang that at least sometimes manifests quite similarly refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 14 at lib/refcount.c:280 refcount_dec_not_one+0xc0/0xd8 ... Call Trace: [cf45be10] [c0238258] refcount_dec_not_one+0xc0/0xd8 (unreliable) [cf45be20] [c02383c8] refcount_dec_and_lock_irqsave+0x20/0xa4 [cf45be40] [c0024a70] free_uid+0x2c/0xa0 [cf45be60] [c00384f0] put_cred_rcu+0x58/0x8c [cf45be70] [c005f048] rcu_cpu_kthread+0x364/0x49c [cf45bee0] [c003a0d0] smpboot_thread_fn+0x21c/0x29c [cf45bf10] [c0036464] kthread+0xe0/0x10c [cf45bf40] [c000f1cc] ret_from_kernel_thread+0x14/0x1c But our reproducer is rather complicated and involves cutting power to neighbouring boards, and takes many minutes to trigger. So I tried Daniel's reproducer sigwaittest -t -a -p 98 and almost immediately got a trace much more similar to the one in the commit message refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 1526 at lib/refcount.c:280 refcount_dec_not_one+0xc0/0xd8 ... Call Trace: [cebc9e00] [c0238258] refcount_dec_not_one+0xc0/0xd8 (unreliable) [cebc9e10] [c02383c8] refcount_dec_and_lock_irqsave+0x20/0xa4 [cebc9e30] [c0024a70] free_uid+0x2c/0xa0 [cebc9e50] [c002574c] dequeue_signal+0x90/0x1a4 [cebc9e80] [c0028f74] sys_rt_sigtimedwait+0x24c/0x288 [cebc9f40] [c000f12c] ret_from_syscall+0x0/0x40 With this patch applied, the sigwaittest has now run for 10 minutes without problems. I'll have to run some more tests with our reproducer to see if it really is the same issue, but even if not, the fact that the sigwaittest fails should be enough to put this in 4.19-rt. Three requests (in order of importance): * pull this into 4.19-rt * add a note about the sigwaittest reproducer to the commit log * do publish the rt-rcs in some git repository; that makes it a lot easier to cherry-pick and test patches Thanks, Rasmus
Re: [PATCH] overflow: Add __must_check attribute to check_*() helpers
On 12/08/2020 23.51, Kees Cook wrote: > Since the destination variable of the check_*_overflow() helpers will > contain a wrapped value on failure, it would be best to make sure callers > really did check the return result of the helper. Adjust the macros to use > a bool-wrapping static inline that is marked with __must_check. This means > the macros can continue to have their type-agnostic behavior while gaining > the function attribute (that cannot be applied directly to macros). > > Suggested-by: Rasmus Villemoes > Signed-off-by: Kees Cook > --- > include/linux/overflow.h | 51 +++- > 1 file changed, 30 insertions(+), 21 deletions(-) > > diff --git a/include/linux/overflow.h b/include/linux/overflow.h > index 93fcef105061..ef7d538c2d08 100644 > --- a/include/linux/overflow.h > +++ b/include/linux/overflow.h > @@ -43,6 +43,16 @@ > #define is_non_negative(a) ((a) > 0 || (a) == 0) > #define is_negative(a) (!(is_non_negative(a))) > > +/* > + * Allows to effectively us apply __must_check to a macro so we can have word ordering? > + * both the type-agnostic benefits of the macros while also being able to > + * enforce that the return value is, in fact, checked. > + */ > +static inline bool __must_check __must_check_bool(bool condition) > +{ > + return unlikely(condition); > +} > + > #ifdef COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW > /* > * For simplicity and code hygiene, the fallback code below insists on > @@ -52,32 +62,32 @@ > * alias for __builtin_add_overflow, but add type checks similar to > * below. > */ > -#define check_add_overflow(a, b, d) ({ \ > +#define check_add_overflow(a, b, d) __must_check_bool(({ \ > typeof(a) __a = (a);\ > typeof(b) __b = (b);\ > typeof(d) __d = (d);\ > (void) (&__a == &__b); \ > (void) (&__a == __d); \ > __builtin_add_overflow(__a, __b, __d); \ > -}) > +})) Sorry, I meant to send this before your cooking was done but forgot about it again. Not a big deal, but it occurred to me it might be better to rename the existing check_*_overflow to __check_*_overflow (in both branches of the COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW), and then #define check_*_overflow(a, b, d) __must_check_bool(__check_*_overflow(a, b, d)) Mostly because it gives less whitespace churn, but it might also be handy to have the dunder versions available (if nothing else then perhaps in some test code). But as I said, no biggie, I'm fine either way. Now I'm just curious if 0-day is going to find some warning introduced by this :) Rasmus
Re: [PATCH linux-5.2.y-rt only] hrtimer: correct the logic for grab expiry lock
On 12/08/2020 12.50, zhantao.t...@windriver.com wrote: > From: Zhantao Tang > > In commit: 47b6de0b7f22 ("hrtimer: Add a missing bracket and hide > `migration_base' on !SMP") > a inline function is_migration_base() is introduced. But > the logic of the hrtimer_grab_expiry_lock was changed. > > This patch is to correct it. > Yup, same patch sent back in April, which also had a fixes tag for 5.2. https://lore.kernel.org/lkml/20200428144026.5882-1-rasmus.villem...@prevas.dk/ It got picked up for 4.19-rt, dunno why it wasn't for 5.2-rt. Rasmus
Re: [RFC] saturate check_*_overflow() output?
On 04/08/2020 21.23, Kees Cook wrote: > On Tue, Aug 04, 2020 at 08:11:45AM +0200, Rasmus Villemoes wrote: >> What we might do, to deal with the "caller fails to check the result", >> is to add a >> >> static inline bool __must_check must_check_overflow(bool b) { return >> unlikely(b); } >> >> and wrap all the final "did it overflow" results in that one - perhaps >> also for the __builtin_* cases, I don't know if those are automatically >> equipped with that attribute. [I also don't know if gcc propagates >> likely/unlikely out to the caller, but it shouldn't hurt to have it >> there and might improve code gen if it does.] > > (What is the formal name for the ({ ...; return_value; }) C construct?) https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html > Will that work as a macro return value? If so, that's extremely useful. Yes and no. Just wrapping the last expression in the statement expression with my must_check_overflow(), as in @@ -67,17 +72,18 @@ typeof(d) __d = (d);\ (void) (&__a == &__b); \ (void) (&__a == __d); \ - __builtin_sub_overflow(__a, __b, __d); \ + must_check_overflow(__builtin_sub_overflow(__a, __b, __d)); \ }) does not appear to work. For some reason, this can't be (ab)used to overrule the __must_check this simply: - kstrtoint(a, b, c); + ({ kstrtoint(a, b, c); }); still gives a warning for kstrtoint(). But failing to use the result of check_sub_overflow() as patched above does not give a warning. I'm guessing gcc has some internal very early simplification that replaces single-expression statement-exprs with just that expression, and the warn-unused-result triggers later. But as soon as the statement-expr becomes a little non-trivial (e.g. above), my guess is that the whole thing gets assigned to some internal "variable" representing the result, and that assignment then counts as a use of the return value from must_check_overflow() - cc'ing Segher, as he usually knows these details. Anyway, we don't need to apply it to the last expression inside ({}), we can just pass the whole ({}) to must_check_overflow() as in -#define check_sub_overflow(a, b, d) ({ \ +#define check_sub_overflow(a, b, d) must_check_overflow(({ \ typeof(a) __a = (a);\ typeof(b) __b = (b);\ typeof(d) __d = (d);\ (void) (&__a == &__b); \ (void) (&__a == __d); \ __builtin_sub_overflow(__a, __b, __d); \ -}) +})) and that's even more natural for the fallback cases which would be #define check_sub_overflow(a, b, d)\ + must_check_overflow(\ __builtin_choose_expr(is_signed_type(typeof(a)),\ __signed_sub_overflow(a, b, d), \ - __unsigned_sub_overflow(a, b, d)) + __unsigned_sub_overflow(a, b, d))) (in all cases with some whitespace reflowing). Rasmus
Re: [RFC] saturate check_*_overflow() output?
On 03/08/2020 20.29, Kees Cook wrote: > Hi, > > I wonder if we should explicitly saturate the output of the overflow > helpers as a side-effect of overflow detection? Please no. (That way the output > is never available with a "bad" value, if the caller fails to check the > result or forgets that *d was written...) since right now, *d will hold > the wrapped value. Exactly. I designed the fallback ones so they would have the same semantics as when using gcc's __builtin_* - though with the "all operands have same type" restriction, since it would be completely unwieldy to handle stuff like (s8) + (u64) -> (s32) in macros. > Also, if we enable arithmetic overflow detection sanitizers, we're going > to trip over the fallback implementation (since it'll wrap and then do > the overflow test in the macro). Huh? The fallback code only ever uses unsigned arithmetic, precisely to avoid triggering such warnings. Or are you saying there are some sanitizers out there which also warn for, say, (~0u) + 1u? Yes, detecting overflow/underflow for a (s32)-(s32)->(s32) without relying on -fwrapv is a bit messy, but it's done and AFAIK works just fine even with UBSAN enabled. What we might do, to deal with the "caller fails to check the result", is to add a static inline bool __must_check must_check_overflow(bool b) { return unlikely(b); } and wrap all the final "did it overflow" results in that one - perhaps also for the __builtin_* cases, I don't know if those are automatically equipped with that attribute. [I also don't know if gcc propagates likely/unlikely out to the caller, but it shouldn't hurt to have it there and might improve code gen if it does.] Rasmus PS: Another reason not to saturate is that there are two extreme values, and choosing between them makes the code very messy (especially when using the __builtins). 5u-10u should saturate to 0u, not UINT_MAX, and even for for underflowing a signed computation like INT_MIN + (-7); it makes no sense for that to saturate to INT_MAX.
Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall
On 10/07/2020 10.30, Cyrill Gorcunov wrote: > On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote: >> The ability to check open file descriptions for equality (without >> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be >> useful outside of the checkpoint/restore use case - for example, >> systemd uses kcmp() to deduplicate the per-service file descriptor >> store. >> >> Make it possible to have the kcmp() syscall without the full >> CONFIG_CHECKPOINT_RESTORE. >> >> Signed-off-by: Rasmus Villemoes >> --- >> I deliberately drop the ifdef in the eventpoll.h header rather than >> replace with KCMP_SYSCALL; it's harmless to declare a function that >> isn't defined anywhere. > > Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h > is not suitable? It's just from a general "avoid ifdef clutter if possible" POV. The conditional declaration of the function doesn't really serve any purpose. > Still the overall idea is fine for me, thanks!> > Reviewed-by: Cyrill Gorcunov Thank you. Rasmus
[PATCH] kcmp: add separate Kconfig symbol for kcmp syscall
The ability to check open file descriptions for equality (without resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be useful outside of the checkpoint/restore use case - for example, systemd uses kcmp() to deduplicate the per-service file descriptor store. Make it possible to have the kcmp() syscall without the full CONFIG_CHECKPOINT_RESTORE. Signed-off-by: Rasmus Villemoes --- I deliberately drop the ifdef in the eventpoll.h header rather than replace with KCMP_SYSCALL; it's harmless to declare a function that isn't defined anywhere. fs/eventpoll.c| 4 ++-- include/linux/eventpoll.h | 2 -- init/Kconfig | 11 +++ kernel/Makefile | 2 +- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 12eebcdea9c8..b0313ce2df73 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1064,7 +1064,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd) return epir; } -#ifdef CONFIG_CHECKPOINT_RESTORE +#ifdef CONFIG_KCMP_SYSCALL static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) { struct rb_node *rbp; @@ -1106,7 +1106,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, return file_raw; } -#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_KCMP_SYSCALL */ /** * Adds a new entry to the tail of the list in a lockless way, i.e. diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 8f000fada5a4..aa799295e373 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -18,9 +18,7 @@ struct file; #ifdef CONFIG_EPOLL -#ifdef CONFIG_CHECKPOINT_RESTORE struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); -#endif /* Used to initialize the epoll bits inside the "struct file" */ static inline void eventpoll_init_file(struct file *file) diff --git a/init/Kconfig b/init/Kconfig index 0498af567f70..95e9486d4217 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1158,9 +1158,20 @@ config NET_NS endif # NAMESPACES +config KCMP_SYSCALL + bool "kcmp system call" + help + Enable the kcmp system call, which allows one to determine + whether to tasks share various kernel resources, for example + whether they share address space, or if two file descriptors + refer to the same open file description. + + If unsure, say N. + config CHECKPOINT_RESTORE bool "Checkpoint/restore support" select PROC_CHILDREN + select KCMP_SYSCALL default n help Enables additional kernel features in a sake of checkpoint/restore. diff --git a/kernel/Makefile b/kernel/Makefile index f3218bc5ec69..3daedba2146a 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -49,7 +49,7 @@ obj-y += rcu/ obj-y += livepatch/ obj-y += dma/ -obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o +obj-$(CONFIG_KCMP_SYSCALL) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o -- 2.23.0
Re: printk of non NULL terminated strings ?
On 09/07/2020 15.26, Joakim Tjernlund wrote: > On Thu, 2020-07-09 at 14:56 +0200, Andreas Schwab wrote: >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you recognize the sender and know the >> content is safe. >> >> >> On Jul 09 2020, Joakim Tjernlund wrote: >> >>> Is there a format (or other function) that lets me >>> print strings without an \0 terminator using an explicit length arg instead? >> >> Use the precision. > > Looking at that now but have a hard time figuring how to use it, can you give > me an example? Exactly as you'd do in userspace: printf("%.*s\n", len, buf) Of course, vsnprintf() will still stop if it encounters a nul byte within those first len bytes in buf. And you need len to have type int, so you may need a cast if you have a size_t or ssize_t or whatnot. Rasmus
Re: [sched] c3a340f7e7: invalid_opcode:#[##]
On 30/06/2020 14.46, Peter Zijlstra wrote: > On Mon, Jun 29, 2020 at 08:31:27AM +0800, kernel test robot wrote: >> Greeting, >> >> FYI, we noticed the following commit (built with gcc-4.9): >> >> commit: c3a340f7e7eadac7662ab104ceb16432e5a4c6b2 ("sched: Have >> sched_class_highest define by vmlinux.lds.h") > >> [1.840970] kernel BUG at kernel/sched/core.c:6652! > > W T H > > $ readelf -Wa defconfig-build/vmlinux | grep sched_class > 62931: c1e62d20 0 NOTYPE GLOBAL DEFAULT2 __begin_sched_classes > 65736: c1e62f4096 OBJECT GLOBAL DEFAULT2 stop_sched_class > 71813: c1e62dc096 OBJECT GLOBAL DEFAULT2 fair_sched_class > 78689: c1e62d4096 OBJECT GLOBAL DEFAULT2 idle_sched_class > 78953: c1e62fa0 0 NOTYPE GLOBAL DEFAULT2 __end_sched_classes > 79090: c1e62e4096 OBJECT GLOBAL DEFAULT2 rt_sched_class > 79431: c1e62ec096 OBJECT GLOBAL DEFAULT2 dl_sched_class > > $ printf "%d\n" $((0xc1e62dc0 - 0xc1e62d40)) > 128 > > So even though the object is 96 bytes in size, has an explicit 32 byte > alignment, the array ends up with a stride of 128 bytes !?!?! > > Consistently so with GCC-4.9. Any other GCC I tried does the sane thing. Does that include gcc 4.8, or is it only "anything newer than 4.9"? > > Full patch included below. > > Anybody any clue wth 4.9 is doing crazy things like this? Perhaps https://gcc.gnu.org/onlinedocs/gcc-4.9.4/gcc/Variable-Attributes.html#Variable-Attributes: When used on a struct, or struct member, the aligned attribute can only increase the alignment; in order to decrease it, the packed attribute must be specified as well. When used as part of a typedef, the aligned attribute can both increase and decrease alignment, and specifying the packed attribute generates a warning. is part of the explanation. But this is seriously weird. I don't know which .config you or the buildbot used, but I took an i386_defconfig with SMP=n to get a small enough struct sched_class (and disable retpoline stuff), added diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 81640fe0eae8..53c0d3ba62ba 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -71,6 +71,13 @@ unsigned int sysctl_sched_rt_period = 100; __read_mostly int scheduler_running; +void foo(void) +{ + extern void bar(int); + bar(sizeof(struct sched_class)); + bar(_Alignof(struct sched_class)); +} + /* * part of the period that we allow rt tasks to run in us. * default: 0.95s and apparently _Alignof is only 16: 2c90 : 2c90: 55 push %ebp 2c91: b8 60 00 00 00 mov$0x60,%eax 2c96: 89 e5 mov%esp,%ebp 2c98: e8 fc ff ff ff call 2c99 2c99: R_386_PC32bar 2c9d: b8 10 00 00 00 mov$0x10,%eax 2ca2: e8 fc ff ff ff call 2ca3 2ca3: R_386_PC32bar Neverthess, readelf -S --wide kernel/sched/fair.o: Section Headers: [Nr] Name TypeAddr OffSize ES Flg Lk Inf Al [35] __fair_sched_class PROGBITS 002980 60 00 A 0 0 64 so the section it was put in has an alignment of 64. The generated assembly is indeed .globl fair_sched_class .section__fair_sched_class,"a",@progbits .align 64 /me goes brew coffee
Re: [PATCH v3 bpf-next 4/8] printk: add type-printing %pT format specifier which uses BTF
On 23/06/2020 14.07, Alan Maguire wrote: > diff --git a/include/linux/printk.h b/include/linux/printk.h > index fc8f03c..8f8f5d2 100644 > --- a/include/linux/printk.h > +++ b/include/linux/printk.h > @@ -618,4 +618,20 @@ static inline void print_hex_dump_debug(const char > *prefix_str, int prefix_type, > #define print_hex_dump_bytes(prefix_str, prefix_type, buf, len) \ > print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true) > > +/** > + * struct btf_ptr is used for %pT (typed pointer) display; the > + * additional type string/BTF id are used to render the pointer > + * data as the appropriate type. > + */ > +struct btf_ptr { > + void *ptr; > + const char *type; > + u32 id; > +}; > + > +#define BTF_PTR_TYPE(ptrval, typeval) \ > + (&((struct btf_ptr){.ptr = ptrval, .type = #typeval})) > + > +#define BTF_PTR_ID(ptrval, idval) \ > + (&((struct btf_ptr){.ptr = ptrval, .id = idval})) Isn't there some better place to put this than printk.h? Anyway, you probably want the ptr member to be "const void*", to avoid "... discards const qualifier" warnings when somebody happens to have a "const struct foobar *". > #endif > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > index 259e558..c0d209d 100644 > --- a/lib/vsprintf.c > +++ b/lib/vsprintf.c > @@ -44,6 +44,7 @@ > #ifdef CONFIG_BLOCK > #include > #endif > +#include > > #include "../mm/internal.h" /* For the trace_print_flags arrays */ > > @@ -2092,6 +2093,87 @@ char *fwnode_string(char *buf, char *end, struct > fwnode_handle *fwnode, > return widen_string(buf, buf - buf_start, end, spec); > } > > +#define btf_modifier_flag(c) (c == 'c' ? BTF_SHOW_COMPACT : \ > + c == 'N' ? BTF_SHOW_NONAME : \ > + c == 'x' ? BTF_SHOW_PTR_RAW : \ > + c == 'u' ? BTF_SHOW_UNSAFE : \ > + c == '0' ? BTF_SHOW_ZERO : 0) > + > +static noinline_for_stack > +char *btf_string(char *buf, char *end, void *ptr, struct printf_spec spec, > + const char *fmt) > +{ > + struct btf_ptr *bp = (struct btf_ptr *)ptr; > + u8 btf_kind = BTF_KIND_TYPEDEF; > + const struct btf_type *t; > + const struct btf *btf; > + char *buf_start = buf; > + const char *btf_type; > + u64 flags = 0, mod; > + s32 btf_id; > + > + if (check_pointer(&buf, end, ptr, spec)) > + return buf; > + > + if (check_pointer(&buf, end, bp->ptr, spec)) > + return buf; > + > + while (isalnum(*fmt)) { > + mod = btf_modifier_flag(*fmt); > + if (!mod) > + break; > + flags |= mod; > + fmt++; > + } > + > + btf = bpf_get_btf_vmlinux(); AFAICT, this function is only compiled if CONFIG_BPF=y and CONFIG_BPF_SYSCALL=y, and I don't see any static inline stub defined anywhere. Have you built the kernel with one or both of those turned off? Rasmus
Re: [PATCH v3 00/21] dynamic_debug cleanups, query features, export
On 17/06/2020 18.25, Jim Cromie wrote: > this is v3, changes from previous: > - moved non-controversial commits up front, refactors to help this. > - a few more minor cleanups > - left out the WIP patches > - export ddebug_exec_queries() > - accept file=foo:func only, not module:foo > - varname changes > > v2: > https://lore.kernel.org/lkml/20200613155738.2249399-1-jim.cro...@gmail.com/ > v1: https://lore.kernel.org/lkml/20200605162645.289174-1-jim.cro...@gmail.com/ > > Patchset starts with 11 cleanups; > - change section name from vague "__verbose" to "__dyndbg" > - cleaner docs, drop obsolete comment & useless debug prints, >refine verbosity, fix a BUG_ON, ram reporting miscounts. etc.. So I haven't been following too closely, but I'm also a bit skeptical about the new custom flag thing. OTOH, I'd really like to see those first cleanups go in soon, especially patch 6 - which not only makes the ram use a bit more accurate, it also avoids ~1 calls of strlen() on cache-cold memory during boot. So, FWIW, you have my Acked-by for patches 1 through 11, and I hope those can be picked up before the next merge window. But the remaining ones seem to still require some discussion. Rasmus
Re: WIP generic module->debug_flags and dynamic_debug
On 10/06/2020 20.32, jim.cro...@gmail.com wrote: > so Ive got a WIP / broken / partial approach to giving all modules a > u32 flagset, > and enabling pr_debug based upon it. I leave out the "pr_debug_typed( > bitpos )" for now. For Stanimir, bits 1,2,3 could be high, middle, > low. > > ATM its broken on my lack of container_of() skills. > > Im trying to use it to get a struct module* using its name value thats > been copied > into a ddebug_table member. > > Im relying on > cdf6d006968 dynamic_debug: don't duplicate modname in ddebug_add_module > to have the same value in both structs > > but Im clearly missing a few things > besides the likely future trouble with .rodata builtin modules > (after compile prob solved) > > It seems container_of wants me to use struct ddebug_table instead, > but I dont want a *ddebug_table. > Ive blindly guessed at adding & and * to 1st param, w/o understanding. > > can anyone diagnose my problem ? Sorry, I have not the faintest idea of what you're trying to achieve. Can you spell that out? Rasmus
must populate_rootfs be synchronous?
I have worked on several boards where there is a rather "aggressive" external (gpio) watchdog, with timeouts between 250 and 800 ms. Combined with a rather slow CPU and memory, these boards often fail to make it through populate_rootfs, since unpacking the initramfs is rather time-consuming, and we haven't gotten to get the watchdog driver handle the watchdog. [No, GPIO_WATCHDOG_ARCH_INITCALL doesn't help, at least not for all cases, probably because the device isn't "discovered" until mpc83xx_declare_of_platform_devices runs, which is at device_initcall time]. So, those boards currently use some rather ugly patches to make them boot. I'd like to get rid of those. I assume there's a good reason populate_rootfs runs between fs_initcall and device_initcalls - perhaps one of the latter wants some firmware or do a request_module? But, would it be possible to throw most of populate_rootfs into a work item, add some globally visible DECLARE_COMPLETION(initramfs_unpacked) which is complete_all'ed at the end, and then in the (I assume) relatively few places that might need to look at the filesystem add a wait_for_completion(initramfs_unpacked) - including of course right before the console_on_rootfs() call in kernel_init_freeable() (so also before we start asking whether there is /init or not). Rasmus PS: This is the slowness, .7 seconds to unpack 4MB - it got a bit better when switching to lz4 compression, but still in the few hundreds of milliseconds range, which is way too much given the external watchdog's requirements: [0.047970] Trying to unpack rootfs image as initramfs... [0.768516] Freeing initrd memory: 3972K
Re: [PATCH resend] fs/namei.c: micro-optimize acl_permission_check
On 07/06/2020 21.48, Linus Torvalds wrote: > On Sun, Jun 7, 2020 at 9:37 AM Linus Torvalds > wrote: >> >>> That will kinda work, except you do that mask &= MAY_RWX before >>> check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits. >> >> Good catch. > > With the change to not clear the non-rwx bits in general, the owner > case now wants to do the masking, and then the "shift left by 6" > modification makes no sense since it only makes for a bigger constant > (the only reason to do the shift-right was so that the bitwise not of > the i_mode could be done in parallel with the shift, but with the > masking that instruction scheduling optimization becomes kind of > immaterial too). So I modified that patch to not bother, and add a > comment about MAY_NOT_BLOCK. > > And since I was looking at the MAY_NOT_BLOCK logic, it was not only > not mentioned in comments, it also had some really confusing code > around it. > > The posix_acl_permission() looked like it tried to conserve that bit, > which is completely wrong. It wasn't a bug only for the simple reason > that the only two call-sites had either explicitly cleared the bit > when calling, or had tested that the bit wasn't set in the first > place. > > So as a result, I wrote a second patch to clear that confusion up. > > Rasmus, say the word and I'll mark you for authorship on the first one. It might be a bit confusing with me mentioned in the third person and then also author, and it's really mostly your patch, so reported-by is fine with me. But it's up to you. > Comments? Can you find something else wrong here, or some other fixup to do? No, I think it's ok. I took a look at the disassembly and it looks fine. There's an extra push/pop of %r14 [that's where gcc computes mode>>3, then CSE allows it to do cmovne %r14d,%ebx after in_group_p), so the owner case gets slightly penalized. I think/hope the savings from avoiding the in_group_p should compensate for that - any absolute path open() by non-root saves at least two in_group_p. YMMV. Rasmus
Re: [PATCH resend] fs/namei.c: micro-optimize acl_permission_check
On 05/06/2020 22.18, Linus Torvalds wrote: > On Fri, Jun 5, 2020 at 7:23 AM Rasmus Villemoes > wrote: >> >> + /* >> +* If the "group" and "other" permissions are the same, >> +* there's no point calling in_group_p() to decide which >> +* set to use. >> +*/ >> + if mode >> 3) ^ mode) & 7) && in_group_p(inode->i_gid)) >> mode >>= 3; > > Ugh. Not only is this ugly, but it's not even the best optimization. > > We don't care that group and other match exactly. We only care that > they match in the low 3 bits of the "mask" bits. Yes, I did think about that, but I thought this was the more obviously correct approach, and that in practice one only sees the 0X44 and 0X55 cases. > So if we want this optimization - and it sounds worth it - I think we > should do it right. But I also think it should be written more > legibly. > > And the "& 7" is the same "& (MAY_READ | MAY_WRITE | MAY_EXEC)" we do later. > > In other words, if we do this, I'd like it to be done even more > aggressively, but I'd also like the end result to be a lot more > readable and have more comments about why we do that odd thing. > > Something like this *UNTESTED* patch, perhaps? That will kinda work, except you do that mask &= MAY_RWX before check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits. > I might have gotten something wrong, so this would need > double-checking, but if it's right, I find it a _lot_ more easy to > understand than making one expression that is pretty complicated and > opaque. Well, I thought this was readable enough with the added comment. There's already that magic constant 3 in the shifts, so the 7 seemed entirely sensible, though one could spell it 0007. Whatever. Perhaps this? As a whole function, I think that's a bit easier for brain-storming. It's your patch, just with that rwx thing used instead of mask, except for the call to check_acl(). static int acl_permission_check(struct inode *inode, int mask) { unsigned int mode = inode->i_mode; unsigned int rwx = mask & (MAY_READ | MAY_WRITE | MAY_EXEC); /* Are we the owner? If so, ACL's don't matter */ if (likely(uid_eq(current_fsuid(), inode->i_uid))) { if ((rwx << 6) & ~mode) return -EACCES; return 0; } /* Do we have ACL's? */ if (IS_POSIXACL(inode) && (mode & S_IRWXG)) { int error = check_acl(inode, mask); if (error != -EAGAIN) return error; } /* * Are the group permissions different from * the other permissions in the bits we care * about? Need to check group ownership if so. */ if (rwx & (mode ^ (mode >> 3))) { if (in_group_p(inode->i_gid)) mode >>= 3; } /* Bits in 'mode' clear that we require? */ return (rwx & ~mode) ? -EACCES : 0; } Rasmus
[PATCH resend] fs/namei.c: micro-optimize acl_permission_check
Just something like open(/usr/include/sys/stat.h) causes five calls of generic_permission -> acl_permission_check -> in_group_p; if the compiler first tried /usr/local/include/..., that would be a few more. Altogether, on a bog-standard Ubuntu 20.04 install, a workload consisting of compiling lots of userspace programs (i.e., calling lots of short-lived programs that all need to get their shared libs mapped in, and the compilers poking around looking for system headers - lots of /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around 0.1% according to perf top. With an artificial load such as while true ; do find /usr/ -print0 | xargs -0 stat > /dev/null ; done that jumps to over 0.4%. System-installed files are almost always 0755 (directories and binaries) or 0644, so in most cases, we can avoid the binary search and the cost of pulling the cred->groups array and in_group_p() .text into the cpu cache. Signed-off-by: Rasmus Villemoes --- fs/namei.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index d81f73ff1a8b..c6f0c6643db5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -303,7 +303,12 @@ static int acl_permission_check(struct inode *inode, int mask) return error; } - if (in_group_p(inode->i_gid)) + /* +* If the "group" and "other" permissions are the same, +* there's no point calling in_group_p() to decide which +* set to use. +*/ + if mode >> 3) ^ mode) & 7) && in_group_p(inode->i_gid)) mode >>= 3; } -- 2.23.0
Re: [RFC][PATCH 0/4] x86/entry: disallow #DB more
On 23/05/2020 23.32, Peter Zijlstra wrote: > On Sat, May 23, 2020 at 02:59:40PM +0200, Peter Zijlstra wrote: >> On Fri, May 22, 2020 at 03:13:57PM -0700, Andy Lutomirski wrote: > >> Good point, so the trivial optimization is below. I couldn't find >> instruction latency numbers for DRn load/stores anywhere. I'm hoping >> loads are cheap. > > + u64 empty = 0, read = 0, write = 0; > + unsigned long dr7; > + > + for (i=0; i<100; i++) { > + u64 s; > + > + s = rdtsc(); > + barrier_nospec(); > + barrier_nospec(); > + empty += rdtsc() - s; > + > + s = rdtsc(); > + barrier_nospec(); > + dr7 = native_get_debugreg(7); > + barrier_nospec(); > + read += rdtsc() - s; > + > + s = rdtsc(); > + barrier_nospec(); > + native_set_debugreg(7, 0); > + barrier_nospec(); > + write += rdtsc() - s; > + } > + > + printk("XXX: %ld %ld %ld\n", empty, read, write); > > > [1.628125] XXX: 2800 2404 19600 > > IOW, reading DR7 is basically free, and certainly cheaper than looking > at cpu_dr7 which would probably be an insta cache miss. > Naive question: did you check disassembly to see whether gcc threw your native_get_debugreg() away, given that the asm isn't volatile and the result is not used for anything? Testing here only shows a "mov %r9,%db7", but the read did seem to get thrown away. Rasmus
Re: [PATCH] sched/fair: Return true,false in voluntary_active_balance()
On 08/05/2020 10.16, Peter Zijlstra wrote: > On Thu, May 07, 2020 at 07:06:25PM +0800, Jason Yan wrote: >> Fix the following coccicheck warning: >> >> kernel/sched/fair.c:9375:9-10: WARNING: return of 0/1 in function >> 'voluntary_active_balance' with return type bool > > That's not a warning, that's a broken cocinelle script, which if these > stupid patches don't stop, I'm going to delete myself! I was wondering why I got cc'ed here. Then it dawned on me. What can I say, I was young. Ack on nuking it. Rasmus
Re: [PATCH 5/5] rtc: pcf2127: report battery switch over
On 05/05/2020 22.13, Alexandre Belloni wrote: > Add support for the RTC_VL_BACKUP_SWITCH flag to report battery switch over > events. > > Signed-off-by: Alexandre Belloni > --- > drivers/rtc/rtc-pcf2127.c | 16 > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c > index 039078029bd4..967de68e1b03 100644 > --- a/drivers/rtc/rtc-pcf2127.c > +++ b/drivers/rtc/rtc-pcf2127.c > @@ -188,18 +188,27 @@ static int pcf2127_rtc_ioctl(struct device *dev, > unsigned int cmd, unsigned long arg) > { > struct pcf2127 *pcf2127 = dev_get_drvdata(dev); > - int touser; > + int val, touser = 0; > int ret; > > switch (cmd) { > case RTC_VL_READ: > - ret = regmap_read(pcf2127->regmap, PCF2127_REG_CTRL3, &touser); > + ret = regmap_read(pcf2127->regmap, PCF2127_REG_CTRL3, &val); > if (ret) > return ret; > > - touser = touser & PCF2127_BIT_CTRL3_BLF ? RTC_VL_BACKUP_LOW : 0; > + if (val & PCF2127_BIT_CTRL3_BLF) > + touser = RTC_VL_BACKUP_LOW; > + > + if (val & PCF2127_BIT_CTRL3_BF) > + touser |= RTC_VL_BACKUP_SWITCH; I think it's a bit easier to read if you use |= in both cases. Re patch 3, one saves a little .text by eliding the ioctl function when, as you say, it cannot be called anyway. No strong opinion either way, I don't think anybody actually builds without CONFIG_RTC_INTF_DEV, but those that do are probably the ones that care about having a tiny vmlinux. Other than that, the series looks good to me. Thanks, Rasmus
Re: battery switch-over detection on pcf2127
On 05/05/2020 22.38, Bruno Thomsen wrote: > Hi Rasmus > > Den tir. 5. maj 2020 kl. 22.07 skrev Alexandre Belloni > : >> >> On 05/05/2020 21:54:47+0200, Rasmus Villemoes wrote: >>> Hi Bruno >>> >>> I just noticed your "rtc: pcf2127: add tamper detection support" >>> (03623b4b04) from 5.4. Unfortunately, clearing the BTSE bit breaks a use >>> case of ours: >>> >>> We rely on the battery switch-over detection to distinguish a powerfail >>> during boot from a PORESET by the external watchdog (in the latter case, >>> the RTC is still powered throughout, meaning there is no battery >>> switch-over event). OTOH, we do not use the tamper detection - in fact, >>> the TS signal is unconnected on our board. >>> >>> We're currently still on 4.19, but we will eventually upgrade to a >>> kernel containing the above commit. So I was wondering if we could >>> figure out a way that would work for both of us - either some CONFIG >>> knob, or perhaps something in the device-tree. Any ideas? >>> >> >> Yes, I was working on a patch series last week allowing to read BF. I'm >> not sure clearing BTSE is your issue but clearing BF is. >> >> I'm going to send it tonight, I'll copy you, let me now if that works >> for you. You can then read BF using the RTC_VL_READ ioctl. The >> RTC_VL_BACKUP_SWITCH flag will be set if a switchover happened. >> The RTC_VL_CLR ioctl can be used to clear the flag. >> >> I think clearing BTSE is still the right thing to do. > > I think your use case is valid and it sounds like Alexandre solution will > solve it as you just need to know if a battery switch-over has happened > not when exactly it happened. > > I can help test the patches too. Thanks for the quick replies, both. Unfortunately, being able to read BF from linux is not relevant to us - all the handling happens early in the bootloader (including clearing BF, so that we can detect that the previous boot failed only because of power fail - hence whether the linux driver clears BF or not is not relevant). We really just want linux to not touch the bits in CTRL3 at all. Hm, wait. Re-reading the above suggests that BF can get set even if BTSE is not, and a quick experiment shows that is true - I must have misread the data sheet. While I think that's fine for now (currently I only print the time of last switch-over as a diagnostic), I did have some use case in mind for comparing that timestamp to the current time and make decisions based on that. But until I figure out exactly what I want to use it for, and until we actually upgrade to 5.4+, there's no rush. Thanks, Rasmus
battery switch-over detection on pcf2127
Hi Bruno I just noticed your "rtc: pcf2127: add tamper detection support" (03623b4b04) from 5.4. Unfortunately, clearing the BTSE bit breaks a use case of ours: We rely on the battery switch-over detection to distinguish a powerfail during boot from a PORESET by the external watchdog (in the latter case, the RTC is still powered throughout, meaning there is no battery switch-over event). OTOH, we do not use the tamper detection - in fact, the TS signal is unconnected on our board. We're currently still on 4.19, but we will eventually upgrade to a kernel containing the above commit. So I was wondering if we could figure out a way that would work for both of us - either some CONFIG knob, or perhaps something in the device-tree. Any ideas? Rasmus
Re: [PATCH v4 14/18] static_call: Add static_cond_call()
On 04/05/2020 22.14, Peter Zijlstra wrote: > On Mon, May 04, 2020 at 09:20:03AM +0200, Rasmus Villemoes wrote: > >> >> Indeed, that is horrible. And it "fixes" the argument evaluation by >> changing the !HAVE_STATIC_CALL case to match the HAVE_STATIC_CALL, not >> the other way around, > > Correct; making it the other way is far more 'interesting'. It would > basically mean combining the static_branch and static_call, but that > would also make it less optimal for simple forwarding cases. Yes, I can see how implementing that would be quite hard. >> which means that it is not a direct equivalent to the >> >> if (foo) >> foo(a, b, c) >> >> [which pattern of course has the READ_ONCE issue, but each individual >> existing site with that may be ok for various reasons]. >> >> Is gcc smart enough to change the if (!func) to a jump across the >> function call (but still evaluting side effects in args), or is >> __static_call_nop actually emitted and called? > > I was hoping it would be clever, but I just tried (find below) and it is > not -- although there's always hoping a newer version / clang might be > smarter. > > It does indeed emit the nop function :/ Hm. >> If the latter, then one >> might as well patch the write-side to do "WRITE_ONCE(foo, func ? : >> __static_call_nop)" and elide the test from __static_cond_call() - in >> fact, that just becomes a single READ_ONCE. [There's probably some >> annoying issue with making sure static initialization of foo points at >> __static_call_nop]. > > But that would not give a more clever compiler the ability to do the > 'right' thing here.. True. However, I think it's unlikely we'll see a compiler being that clever anytime soon. Anyway, it's hard to judge what version of the !HAVE_STATIC_CALL implementation is best when there's no !HAVE_STATIC_CALL use cases to look at. I just want to ensure that whatever limitations or gotchas (e.g., "arguments are evaluated regardless of NULLness of func", or alternatively "arguments must not have side effects") it ends up with get documented. > > #define __static_cond_call(name) \ > ({ \ > void *func = READ_ONCE(name.func); \ > if (!func) \ > func = &__static_call_nop; \ > (typeof(__SCT__##name)*)func; \ > }) I think you can just make it #define __static_cond_call(name) \ ( \ (typeof(__SCT__##name)*) ((void *)name.func ? : (void *)__static_call_nop) \ ) but that simplification is not enough to make gcc change its mind about how to compile it :( But I'm guessing that various sanitizers or static checkers might complain about the UB. Rasmus
Re: [PATCH v4 14/18] static_call: Add static_cond_call()
On 03/05/2020 14.58, Peter Zijlstra wrote: > On Sat, May 02, 2020 at 03:08:00PM +0200, Rasmus Villemoes wrote: >> On 01/05/2020 22.29, Peter Zijlstra wrote: >>> +#define static_cond_call(name) >>> \ >>> + if (STATIC_CALL_KEY(name).func) \ >>> + ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func)) >>> + >> >> This addresses neither the READ_ONCE issue nor the fact that, >> AFAICT, the semantics of >> >> static_cond_call(foo)(i++) >> >> will depend on CONFIG_HAVE_STATIC_CALL. > > True. > > So there is something utterly terrible we can do to address both: > > void __static_call_nop(void) > { > } > > #define __static_cond_call(name) \ > ({ \ > void *func = READ_ONCE(STATIC_CALL_KEY(name).func); \ > if (!func) \ > func = &__static_call_nop; \ > (typeof(STATIC_CALL_TRAMP(name))*)func; \ > }) > > #define static_cond_call(name) (void)__static_cond_call(name) > > This gets us into Undefined Behaviour territory, but it ought to work. > > It adds the READ_ONCE(), and it cures the argument evaluation issue. Indeed, that is horrible. And it "fixes" the argument evaluation by changing the !HAVE_STATIC_CALL case to match the HAVE_STATIC_CALL, not the other way around, which means that it is not a direct equivalent to the if (foo) foo(a, b, c) [which pattern of course has the READ_ONCE issue, but each individual existing site with that may be ok for various reasons]. Is gcc smart enough to change the if (!func) to a jump across the function call (but still evaluting side effects in args), or is __static_call_nop actually emitted and called? If the latter, then one might as well patch the write-side to do "WRITE_ONCE(foo, func ? : __static_call_nop)" and elide the test from __static_cond_call() - in fact, that just becomes a single READ_ONCE. [There's probably some annoying issue with making sure static initialization of foo points at __static_call_nop]. And that brings me to the other issue I raised - do you have a few examples of call sites that could use this, so we can see disassembly before/after? I'm still concerned that, even if there are no side-effects in the arguments, you still force the compiler to spill/shuffle registers for call/restore unconditionally, whereas with a good'ol if(), all that work is guarded by the load+test. Rasmus
Re: [PATCH v4 14/18] static_call: Add static_cond_call()
On 01/05/2020 22.29, Peter Zijlstra wrote: > Extend the static_call infrastructure to optimize the following common > pattern: > > if (func_ptr) > func_ptr(args...) > > + > #define static_call(name)__static_call(name) > +#define static_cond_call(name) (void)__static_call(name) > > + > #define static_call(name)__static_call(name) > +#define static_cond_call(name) (void)__static_call(name) > > +#define static_cond_call(name) > \ > + if (STATIC_CALL_KEY(name).func) \ > + ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func)) > + This addresses neither the READ_ONCE issue nor the fact that, AFAICT, the semantics of static_cond_call(foo)(i++) will depend on CONFIG_HAVE_STATIC_CALL. Also, I'd have appreciated being cc'ed on new revisions instead of stumbling on it by chance. Rasmus
Re: [PATCH RT 0/2] Linux v4.19.115-rt50-rc1
On 28/04/2020 21.52, zanu...@kernel.org wrote: > From: Tom Zanussi > > Dear RT Folks, > > This is the RT stable review cycle of patch 4.19.115-rt50-rc1. > > Please scream at me if I messed something up. Please test the patches > too. For good measure, the customer reports that 4.19.115-rt50-rc1 indeed seems to fix the boot hang they saw. Thanks, Rasmus
Re: [RFC PATCH bpf-next 0/6] bpf, printk: add BTF-based type printing
On 20/04/2020 18.32, Joe Perches wrote: > On Mon, 2020-04-20 at 16:29 +0100, Alan Maguire wrote: struct sk_buff *skb = alloc_skb(64, GFP_KERNEL); pr_info("%pTN", skb); >>> >>> why follow "TN" convention? >>> I think "%p" is much more obvious, unambiguous, and >>> equally easy to parse. >>> >> >> That was my first choice, but the first character >> after the 'p' in the '%p' specifier signifies the >> pointer format specifier. If we use '<', and have >> '%p<', where do we put the modifiers? '%p' >> seems clunky to me. There's also the issue that %p followed by alnum has been understood to be reserved/specially handled by the kernel's printf implementation for a decade, while other characters have so far been treated as "OK, this was just a normal %p". A quick grep for %p< only gives a hit in drivers/scsi/dc395x.c, but there could be others (with field width modifier between % and p), and in any case I think it's a bad idea to extend the set of characters that cannot follow %p. Rasmus
Re: [RFC PATCH bpf-next 5/6] printk: add type-printing %pT format specifier which uses BTF
On 17/04/2020 12.42, Alan Maguire wrote: > printk supports multiple pointer object type specifiers (printing > netdev features etc). Extend this support using BTF to cover > arbitrary types. "%pT" specifies the typed format, and a suffix > enclosed specifies the type, for example, specifying > > printk("%pT", skb) > > ...will utilize BTF information to traverse the struct sk_buff * > and display it. Support is present for structs, unions, enums, > typedefs and core types (though in the latter case there's not > much value in using this feature of course). > > Default output is compact, specifying values only, but the > 'N' modifier can be used to show field names to more easily > track values. Pointer values are obfuscated as usual. As > an example: > > struct sk_buff *skb = alloc_skb(64, GFP_KERNEL); > pr_info("%pTN", skb); > > ...gives us: > > {{{.next=c7916e9c,.prev=c7916e9c,{.dev=c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,.rb_right=c7916e9c,.rb_left=c7916e9c}|.list={.next=c7916e9c,.prev=c7916e9c}},{.sk=c7916e9c|.ip_defrag_offset=0},{.tstamp=0|.skb_mstamp_ns=0},.cb=['\0'],{{._skb_refdst=0,.destructor=c7916e9c}|.tcp_tsorted_anchor={.next=c7916e9c,.prev=c7916e9c}},._nfct=0,.len=0,.data_len=0,.mac_len=0,.hdr_len=0,.queue_mapping=0,.__cloned_offset=[],.cloned=0x0,.nohdr=0x0,.fclone=0x0,.peeked=0x0,.head_frag=0x0,.pfmemalloc=0x0,.active_extensions=0,.headers_start=[],.__pkt_type_offset=[],.pkt_type=0x0,.ignore_df=0x0,.nf_trace=0x0,.ip_summed=0x0,.ooo_okay=0x0,.l4_hash=0x0,.sw_hash=0x0,.wifi_acked_valid=0x0,.wifi_acked=0x0,.no_fcs=0x0,.encapsulation=0x0,.encap_hdr_csum=0x0,.csum_valid=0x0,.__pkt_vlan_present_offset=[],.vlan_present=0x0,.csum_complete_sw=0x0,.csum_level=0x0,.csum_not_inet=0x0,.dst_pending_co > > printk output is truncated at 1024 bytes. For such cases, the compact > display mode (minus the field info) may be used. "|" differentiates > between different union members. > > Signed-off-by: Alan Maguire > --- > Documentation/core-api/printk-formats.rst | 8 ++ > include/linux/btf.h | 3 +- > lib/Kconfig | 16 > lib/vsprintf.c| 145 > +- > 4 files changed, 169 insertions(+), 3 deletions(-) > > diff --git a/Documentation/core-api/printk-formats.rst > b/Documentation/core-api/printk-formats.rst > index 8ebe46b1..b786577 100644 > --- a/Documentation/core-api/printk-formats.rst > +++ b/Documentation/core-api/printk-formats.rst > @@ -545,6 +545,14 @@ For printing netdev_features_t. > > Passed by reference. > > +BTF-based printing of pointer data > +-- > +If '%pT[N]' is specified, use the BPF Type Format (BTF) to > +show the typed data. For example, specifying '%pT' will > utilize > +BTF information to traverse the struct sk_buff * and display it. > + > +Supported modifer is 'N' (show type field names). > + > Thanks > == > > diff --git a/include/linux/btf.h b/include/linux/btf.h > index 2f78dc8..456bd8f 100644 > --- a/include/linux/btf.h > +++ b/include/linux/btf.h > @@ -158,10 +158,11 @@ static inline const struct btf_member > *btf_type_member(const struct btf_type *t) > return (const struct btf_member *)(t + 1); > } > > +struct btf *btf_parse_vmlinux(void); > + > #ifdef CONFIG_BPF_SYSCALL > const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); > const char *btf_name_by_offset(const struct btf *btf, u32 offset); > -struct btf *btf_parse_vmlinux(void); > struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); > #else > static inline const struct btf_type *btf_type_by_id(const struct btf *btf, > diff --git a/lib/Kconfig b/lib/Kconfig > index bc7e563..e92109e 100644 > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@ -6,6 +6,22 @@ > config BINARY_PRINTF > def_bool n > > +config BTF_PRINTF I don't see any IS_ENABLED(BTF_PRINTF) anywhere in this patch? Shouldn't the vsprintf.c handler be guarded by that? > +#define is_btf_fmt_start(c) (c == 'T') > +#define is_btf_type_start(c) (c == '<') > +#define is_btf_type_end(c) (c == '>') > + > +#define btf_modifier_flag(c) (c == 'N' ? BTF_SHOW_NAME : 0) > + > +static noinline_for_stack > +const char *skip_btf_type(const char *fmt, bool *found_btf_type) > +{ > + *found_btf_type = false; > + > + if (!is_btf_fmt_start(*fmt)) > + return fmt; > + fmt++; > + > + while (btf_modifier_flag(*fmt)) > + fmt++; > + > + if (!is_btf_type_start(*fmt)) > + return fmt; > + > + while (!is_btf_type_end(*fmt) && *fmt != '\0') > + fmt++; > + > + if (is_btf_type_end(*fmt)) { > + fmt++; > + *found_btf_type = true; > + } > + > + return fmt; > +} > + > +static noinline_for_stack > +char *btf_string(char *buf, char *end, void *ptr, struct printf_
[PATCH -rt] hrtimer: fix logic for when grabbing softirq_expiry_lock can be elided
Commit hrtimer: Add a missing bracket and hide `migration_base' on !SMP which is 47b6de0b7f22 in 5.2-rt and 40aae5708e7a in 4.19-rt, inadvertently changed the logic from base != &migration_base to base == &migration_base. On !CONFIG_SMP, the effect was to effectively always elide this lock/unlock pair (since is_migration_base() is unconditionally false), which for me consistently causes lockups during reboot, and reportedly also often causes a hang during boot. Adding this logical negation (or, what is effectively the same thing on !CONFIG_SMP, reverting the above commit as well as "hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base") fixes that lockup. Fixes: 40aae5708e7a (hrtimer: Add a missing bracket and hide `migration_base' on !SMP) # 4.19-rt Fixes: 47b6de0b7f22 (hrtimer: Add a missing bracket and hide `migration_base' on !SMP) # 5.2-rt Signed-off-by: Rasmus Villemoes --- Something like this? I wasn't sure what Fixes: tag(s) to include, if any. It's quite possible the same fix is needed on earlier -rt kernels, I didn't check. kernel/time/hrtimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index e54a95de8b79..c3966c090246 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -953,7 +953,7 @@ void hrtimer_grab_expiry_lock(const struct hrtimer *timer) { struct hrtimer_clock_base *base = READ_ONCE(timer->base); - if (timer->is_soft && is_migration_base(base)) { + if (timer->is_soft && !is_migration_base(base)) { spin_lock(&base->cpu_base->softirq_expiry_lock); spin_unlock(&base->cpu_base->softirq_expiry_lock); } -- 2.23.0
Re: [PATCH RT 10/30] hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base
On 28/04/2020 14.59, Tom Zanussi wrote: > On Tue, 2020-04-28 at 09:03 +0200, Rasmus Villemoes wrote: >> Hold on a second. This patch (hrtimer: Prevent using >> hrtimer_grab_expiry_lock() on migration_base) indeed seems to >> implement >> the optimization implied by the above, namely avoid the lock/unlock >> in >> case base == migration_base: >> >>> - if (timer->is_soft && base && base->cpu_base) { >>> + if (timer->is_soft && base != &migration_base) { >> >> But the followup patch (hrtimer: Add a missing bracket and hide >> `migration_base on !SMP) to fix the build on !SMP [the missing >> bracket >> part seems to have been fixed when backporting the above to 4.19-rt] >> replaces that logic by >> >> +static inline bool is_migration_base(struct hrtimer_clock_base >> *base) >> +{ >> +return base == &migration_base; >> +} >> + >> ... >> -if (timer->is_soft && base != &migration_base) { >> +if (timer->is_soft && is_migration_base(base)) { >> >> in the SMP case, i.e. the exact opposite condition. One of these >> can't >> be correct. >> >> Assuming the followup patch was wrong and the condition should have >> read >> >> timer->is_soft && !is_migration_base(base) >> >> while keeping is_migration_base() false on !SMP might explain the >> problem I see. But I'd like someone who knows this code to chime in. >> > > I don't know this code, but I think you're correct - the followup patch > reversed the condition by forgetting the !. > > So, does your problem go away when you make that change? Yes, it does. (I'll have to ask the customer to check in their setup whether the boot hang also vanishes). Essentially, adding that ! is equivalent to reverting the two patches on !SMP (which I also tested): Before, the condition was timer->is_soft && base && base->cpu_base and, assuming the NULL pointer checks are indeed redundant, that's the same as "timer->is_soft". Appending " && !is_migration_base()" to that, with is_migration_base() always false as on !SMP, doesn't change anything. Thanks, Rasmus
Re: [PATCH RT 10/30] hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base
On 23/01/2020 21.39, Steven Rostedt wrote: > 4.19.94-rt39-rc2 stable review patch. > If anyone has any objections, please let me know. > > -- > > From: Julien Grall > > [ Upstream commit cef1b87f98823af923a386f3f69149acb212d4a1 ] > > As tglx puts it: > |If base == migration_base then there is no point to lock soft_expiry_lock > |simply because the timer is not executing the callback in soft irq context > |and the whole lock/unlock dance can be avoided. Hold on a second. This patch (hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base) indeed seems to implement the optimization implied by the above, namely avoid the lock/unlock in case base == migration_base: > - if (timer->is_soft && base && base->cpu_base) { > + if (timer->is_soft && base != &migration_base) { But the followup patch (hrtimer: Add a missing bracket and hide `migration_base on !SMP) to fix the build on !SMP [the missing bracket part seems to have been fixed when backporting the above to 4.19-rt] replaces that logic by +static inline bool is_migration_base(struct hrtimer_clock_base *base) +{ + return base == &migration_base; +} + ... - if (timer->is_soft && base != &migration_base) { + if (timer->is_soft && is_migration_base(base)) { in the SMP case, i.e. the exact opposite condition. One of these can't be correct. Assuming the followup patch was wrong and the condition should have read timer->is_soft && !is_migration_base(base) while keeping is_migration_base() false on !SMP might explain the problem I see. But I'd like someone who knows this code to chime in. Thanks, Rasmus
Re: [PATCH RT 10/30] hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base
On 27/04/2020 21.26, Tom Zanussi wrote: > On Mon, 2020-04-27 at 15:06 -0400, Steven Rostedt wrote: >> On Mon, 27 Apr 2020 15:10:00 +0200 >> Rasmus Villemoes wrote: >> >>> Reverting >>> >>> b1a471ec4df1 - hrtimer: Prevent using hrtimer_grab_expiry_lock() on >>> migration_base >>> 40aae5708e7a - hrtimer: Add a missing bracket and hide >>> `migration_base' >>> on !SMP >>> >>> on top of v4.19.94-rt39 makes that problem go away, i.e. the board >>> reboots as expected. >>> >> Thanks Rasmus for looking into this. Tom now maintains 4.19-rt. >> >> Tom, care to pull in these patches on top of 4.19-rt? >> > > Those patches are already in 4.19-rt - he's saying that reverting them > fixes the problem. > > I'm guessing that the assumption of base or base->cpu_base always being > non-NULL in those patches might be wrong. If so, the below patch > should fix the problem: > > Subject: [PATCH] hrtimer: Add back base and base->cpu_base checks in > hrtimer_grab_expiry_lock() > > 4.19 commit b1a471ec4df1 [hrtimer: Prevent using > hrtimer_grab_expiry_lock() on migration_base] removed the NULL checks > for timer->base and timer->base->cpu_base on the assumption that > they're always non-NULL. That assumption is apparently not to be > true, so add the checks back. > > Signed-off-by: Tom Zanussi > --- > kernel/time/hrtimer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index e54a95de8b79..6f20cf23008b 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -953,7 +953,7 @@ void hrtimer_grab_expiry_lock(const struct hrtimer *timer) > { > struct hrtimer_clock_base *base = READ_ONCE(timer->base); > > - if (timer->is_soft && is_migration_base(base)) { > + if (timer->is_soft && base && base->cpu_base && > is_migration_base(base)) { I'm sorry, but no, I don't think that can be it. For !SMP (my case), is_migration_base() is always false, so with or without the above, the whole if() is false. Also, the symptoms do not look like a NULL pointer deref, but more like a dead (or live) lock - so I'm guessing (and that's just a wild guess) that the lock/unlock sequence is needed to give some other thread a priority boost to make the whole machine make forward progress. Rasmus
Re: [PATCH v4] string-choice: add yesno(), onoff(), enableddisabled(), plural() helpers
On 23/10/2019 15.13, Jani Nikula wrote: > The kernel has plenty of ternary operators to choose between constant > strings, such as condition ? "yes" : "no", as well as value == 1 ? "" : > "s": > > > v4: Massaged commit message about space savings to make it less fluffy > based on Rasmus' feedback. Thanks, it looks good to me. FWIW, Acked-by: Rasmus Villemoes
[PATCH] powerpc/85xx: remove mostly pointless mpc85xx_qe_init()
Since commit 302c059f2e7b (QE: use subsys_initcall to init qe), mpc85xx_qe_init() has done nothing apart from possibly emitting a pr_err(). As part of reducing the amount of QE-related code in arch/powerpc/ (and eventually support QE on other architectures), remove this low-hanging fruit. Signed-off-by: Rasmus Villemoes --- arch/powerpc/platforms/85xx/common.c | 23 --- arch/powerpc/platforms/85xx/corenet_generic.c | 2 -- arch/powerpc/platforms/85xx/mpc85xx.h | 2 -- arch/powerpc/platforms/85xx/mpc85xx_mds.c | 1 - arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 1 - arch/powerpc/platforms/85xx/twr_p102x.c | 1 - 6 files changed, 30 deletions(-) diff --git a/arch/powerpc/platforms/85xx/common.c b/arch/powerpc/platforms/85xx/common.c index fe0606439b5a..a554b6d87cf7 100644 --- a/arch/powerpc/platforms/85xx/common.c +++ b/arch/powerpc/platforms/85xx/common.c @@ -86,29 +86,6 @@ void __init mpc85xx_cpm2_pic_init(void) #endif #ifdef CONFIG_QUICC_ENGINE -void __init mpc85xx_qe_init(void) -{ - struct device_node *np; - - np = of_find_compatible_node(NULL, NULL, "fsl,qe"); - if (!np) { - np = of_find_node_by_name(NULL, "qe"); - if (!np) { - pr_err("%s: Could not find Quicc Engine node\n", - __func__); - return; - } - } - - if (!of_device_is_available(np)) { - of_node_put(np); - return; - } - - of_node_put(np); - -} - void __init mpc85xx_qe_par_io_init(void) { struct device_node *np; diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c index 7ee2c6628f64..a328a741b457 100644 --- a/arch/powerpc/platforms/85xx/corenet_generic.c +++ b/arch/powerpc/platforms/85xx/corenet_generic.c @@ -66,8 +66,6 @@ void __init corenet_gen_setup_arch(void) swiotlb_detect_4g(); pr_info("%s board\n", ppc_md.name); - - mpc85xx_qe_init(); } static const struct of_device_id of_device_ids[] = { diff --git a/arch/powerpc/platforms/85xx/mpc85xx.h b/arch/powerpc/platforms/85xx/mpc85xx.h index fa23f9b0592c..cb84c5c56c36 100644 --- a/arch/powerpc/platforms/85xx/mpc85xx.h +++ b/arch/powerpc/platforms/85xx/mpc85xx.h @@ -10,10 +10,8 @@ static inline void __init mpc85xx_cpm2_pic_init(void) {} #endif /* CONFIG_CPM2 */ #ifdef CONFIG_QUICC_ENGINE -extern void mpc85xx_qe_init(void); extern void mpc85xx_qe_par_io_init(void); #else -static inline void __init mpc85xx_qe_init(void) {} static inline void __init mpc85xx_qe_par_io_init(void) {} #endif diff --git a/arch/powerpc/platforms/85xx/mpc85xx_mds.c b/arch/powerpc/platforms/85xx/mpc85xx_mds.c index 5ca254256c47..120633f99ea6 100644 --- a/arch/powerpc/platforms/85xx/mpc85xx_mds.c +++ b/arch/powerpc/platforms/85xx/mpc85xx_mds.c @@ -238,7 +238,6 @@ static void __init mpc85xx_mds_qe_init(void) { struct device_node *np; - mpc85xx_qe_init(); mpc85xx_qe_par_io_init(); mpc85xx_mds_reset_ucc_phys(); diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c index d3c540ee558f..7f9a84f85766 100644 --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c @@ -89,7 +89,6 @@ static void __init mpc85xx_rdb_setup_arch(void) fsl_pci_assign_primary(); #ifdef CONFIG_QUICC_ENGINE - mpc85xx_qe_init(); mpc85xx_qe_par_io_init(); #if defined(CONFIG_UCC_GETH) || defined(CONFIG_SERIAL_QE) if (machine_is(p1025_rdb)) { diff --git a/arch/powerpc/platforms/85xx/twr_p102x.c b/arch/powerpc/platforms/85xx/twr_p102x.c index 720b0c0f03ba..6c3c0cdaee9a 100644 --- a/arch/powerpc/platforms/85xx/twr_p102x.c +++ b/arch/powerpc/platforms/85xx/twr_p102x.c @@ -72,7 +72,6 @@ static void __init twr_p1025_setup_arch(void) fsl_pci_assign_primary(); #ifdef CONFIG_QUICC_ENGINE - mpc85xx_qe_init(); mpc85xx_qe_par_io_init(); #if IS_ENABLED(CONFIG_UCC_GETH) || IS_ENABLED(CONFIG_SERIAL_QE) -- 2.23.0
Re: [PATCH 3/7] soc: fsl: qe: avoid ppc-specific io accessors
On 22/10/2019 17.01, Christophe Leroy wrote: > > > On 10/18/2019 12:52 PM, Rasmus Villemoes wrote: >> In preparation for allowing to build QE support for architectures >> other than PPC, replace the ppc-specific io accessors. Done via >> > > This patch is not transparent in terms of performance, functions get > changed significantly. > > Before the patch: > > 0330 : > 330: 81 43 00 04 lwz r10,4(r3) > 334: 7c 00 04 ac hwsync > 338: 81 2a 00 00 lwz r9,0(r10) > 33c: 0c 09 00 00 twi 0,r9,0 > 340: 4c 00 01 2c isync > 344: 70 88 00 02 andi. r8,r4,2 > 348: 41 82 00 10 beq 358 > 34c: 39 00 00 01 li r8,1 > 350: 91 03 00 10 stw r8,16(r3) > 354: 61 29 00 10 ori r9,r9,16 > 358: 70 88 00 01 andi. r8,r4,1 > 35c: 41 82 00 10 beq 36c > 360: 39 00 00 01 li r8,1 > 364: 91 03 00 14 stw r8,20(r3) > 368: 61 29 00 20 ori r9,r9,32 > 36c: 7c 00 04 ac hwsync > 370: 91 2a 00 00 stw r9,0(r10) > 374: 4e 80 00 20 blr > > After the patch: > > 030c : > 30c: 94 21 ff e0 stwu r1,-32(r1) > 310: 7c 08 02 a6 mflr r0 > 314: bf a1 00 14 stmw r29,20(r1) > 318: 7c 9f 23 78 mr r31,r4 > 31c: 90 01 00 24 stw r0,36(r1) > 320: 7c 7e 1b 78 mr r30,r3 > 324: 83 a3 00 04 lwz r29,4(r3) > 328: 7f a3 eb 78 mr r3,r29 > 32c: 48 00 00 01 bl 32c > 32c: R_PPC_REL24 ioread32be > 330: 73 e9 00 02 andi. r9,r31,2 > 334: 41 82 00 10 beq 344 > 338: 39 20 00 01 li r9,1 > 33c: 91 3e 00 10 stw r9,16(r30) > 340: 60 63 00 10 ori r3,r3,16 > 344: 73 e9 00 01 andi. r9,r31,1 > 348: 41 82 00 10 beq 358 > 34c: 39 20 00 01 li r9,1 > 350: 91 3e 00 14 stw r9,20(r30) > 354: 60 63 00 20 ori r3,r3,32 > 358: 80 01 00 24 lwz r0,36(r1) > 35c: 7f a4 eb 78 mr r4,r29 > 360: bb a1 00 14 lmw r29,20(r1) > 364: 7c 08 03 a6 mtlr r0 > 368: 38 21 00 20 addi r1,r1,32 > 36c: 48 00 00 00 b 36c > 36c: R_PPC_REL24 iowrite32be True. Do you know why powerpc uses out-of-line versions of these accessors when !PPC_INDIRECT_PIO, i.e. at least all of PPC32? It's quite a bit beyond the scope of this series, but I'd expect moving most if not all of arch/powerpc/kernel/iomap.c into asm/io.h (guarded by !defined(CONFIG_PPC_INDIRECT_PIO) of course) as static inlines would benefit all ppc32 users of iowrite32 and friends. Is there some other primitive available that (a) is defined on all architectures (or at least both ppc and arm) and (b) expands to good code in both/all cases? Note that a few uses of the the iowrite32be accessors has already appeared in the qe code with the introduction of the qe_clrsetbits() helpers in bb8b2062af. Rasmus
Re: [PATCH 0/7] towards QE support on ARM
On 22/10/2019 04.24, Qiang Zhao wrote: > On Mon, Oct 22, 2019 at 6:11 AM Leo Li wrote >> Right. I'm really interested in getting this applied to my tree and make it >> upstream. Zhao Qiang, can you help to review Rasmus's patches and >> comment? > > As you know, I maintained a similar patchset removing PPC, and someone told > me qe_ic should moved into drivers/irqchip/. > I also thought qe_ic is a interrupt control driver, should be moved into dir > irqchip. Yes, and I also plan to do that at some point. However, that's orthogonal to making the driver build on ARM, so I don't want to mix the two. Making it usable on ARM is my/our priority currently. I'd appreciate your input on my patches. Rasmus
Re: [PATCH 0/7] towards QE support on ARM
On 22/10/2019 00.11, Li Yang wrote: > On Mon, Oct 21, 2019 at 3:46 AM Rasmus Villemoes > wrote: >> >>> Can you try the 4.14 branch from a newer LSDK release? LS1021a should >>> be supported platform on LSDK. If it is broken, something is wrong. >> >> What newer release? LSDK-18.06-V4.14 is the latest -V4.14 tag at >> https://github.com/qoriq-open-source/linux.git, and identical to the > > That tree has been abandoned for a while, we probably should state > that in the github. The latest tree can be found at > https://source.codeaurora.org/external/qoriq/qoriq-components/linux/ Ah. FYI, googling "LSDK" gives https://lsdk.github.io as one of the first hits, and (apart from itself being a github url) that says on the front page "Disaggregated components of LSDK are available in github.". But yes, navigating to the Components tab and from there to lsdk linux one does get directed at codeaurora. >> In any case, we have zero interest in running an NXP kernel. Maybe I >> should clarify what I meant by "based on commits from" above: We're >> currently running a mainline 4.14 kernel on LS1021A, with a few patches >> inspired from the NXP 4.1 branch applied on top - but also with some >> manual fixes for e.g. the pvr_version_is() issue. Now we want to move >> that to a 4.19-based kernel (so that it aligns with our MPC8309 platform). > > We also provide 4.19 based kernel in the codeaurora repo. I think it > will be helpful to reuse patches there if you want to make your own > tree. Again, we don't want to run off an NXP kernel, we want to get the necessary pieces upstream. For now, we have to live with a patched 4.19 kernel, but hopefully by the time we switch to 5.x (for some x >= 5) we don't need to supply anything other than our own .dts and defconfig. >> Yes, as I said, I wanted to try a fresh approach since Zhao >> Qiang's patches seemed to be getting nowhere. Splitting the patches into >> smaller pieces is definitely part of that - for example, the completely >> trivial whitespace fix in patch 1 is to make sure the later coccinelle >> generated patch is precisely that (i.e., a later respin can just rerun >> the coccinelle script, with zero manual fixups). I also want to avoid >> mixing the ppcism cleanups with other things (e.g. replacing some >> of_get_property() by of_property_read_u32()). And the "testing on ARM" >> part comes once I get to actually building on ARM. But there's not much >> point doing all that unless there's some indication that this can be >> applied to some tree that actually feeds into Linus', which is why I >> started with a few trivial patches and precisely to start this discussion. > > Right. I'm really interested in getting this applied to my tree and > make it upstream. Zhao Qiang, can you help to review Rasmus's patches > and comment? Thanks, this is exactly what I was hoping for. Even just getting these first rather trivial patches (in that they don't attempt to build for ARM, or change functionality at all for PPC) merged for 5.5 would reduce the amount of out-of-tree patches that we (and NXP for that matter) would have to carry. I'll take the above as a go-ahead for me to try to post more patches working towards enabling some of the QE drivers for ARM. Rasmus