Re: [PATCH v3] lib: Convert test_printf.c to KUnit

2020-11-04 Thread Rasmus Villemoes
On 03/11/2020 17.11, Petr Mladek wrote:
> On Tue 2020-11-03 12:52:23, Greg KH wrote:
>> On Tue, Nov 03, 2020 at 01:33:53PM +0200, Andy Shevchenko wrote:
>>> On Tue, Nov 03, 2020 at 04:40:49PM +0530, Arpitha Raghunandan wrote:
 Convert test lib/test_printf.c to KUnit. More information about
 KUnit can be found at:
 https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html.
 KUnit provides a common framework for unit tests in the kernel.
 KUnit and kselftest are standardizing around KTAP, converting this
 test to KUnit makes this test output in KTAP which we are trying to
 make the standard test result format for the kernel. More about
 the KTAP format can be found at:
 https://lore.kernel.org/linux-kselftest/cy4pr13mb1175b804e31e502221bc8163fd...@cy4pr13mb1175.namprd13.prod.outlook.com/.
 I ran both the original and converted tests as is to produce the
 output for success of the test in the two cases. I also ran these
 tests with a small modification to show the difference in the output
 for failure of the test in both cases. The modification I made is:
 - test("127.000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, 
 &sa.sin_addr);
 + test("127-000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, 
 &sa.sin_addr);

 Original test success:
 [0.540860] test_printf: loaded.
 [0.540863] test_printf: random seed = 0x5c46c33837bc0619
 [0.541022] test_printf: all 388 tests passed

 Original test failure:
 [0.537980] test_printf: loaded.
 [0.537983] test_printf: random seed = 0x1bc1efd881954afb
 [0.538029] test_printf: vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote 
 '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
 [0.538030] test_printf: kvasprintf(..., "%pi4|%pI4", ...) returned 
 '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
 [0.538124] test_printf: failed 2 out of 388 tests
 [0.538125] test_printf: random seed used was 0x1bc1efd881954afb

 Converted test success:
     # Subtest: printf
     1..25
     ok 1 - test_basic
     ok 2 - test_number
     ok 3 - test_string
     ok 4 - plain
     ok 5 - null_pointer
     ok 6 - error_pointer
     ok 7 - invalid_pointer
     ok 8 - symbol_ptr
     ok 9 - kernel_ptr
     ok 10 - struct_resource
     ok 11 - addr
     ok 12 - escaped_str
     ok 13 - hex_string
     ok 14 - mac
     ok 15 - ip
     ok 16 - uuid
     ok 17 - dentry
     ok 18 - struct_va_format
     ok 19 - time_and_date
     ok 20 - struct_clk
     ok 21 - bitmap
     ok 22 - netdev_features
     ok 23 - flags
     ok 24 - errptr
     ok 25 - fwnode_pointer
 ok 1 - printf

 Converted test failure:
     # Subtest: printf
     1..25
     ok 1 - test_basic
     ok 2 - test_number
     ok 3 - test_string
     ok 4 - plain
     ok 5 - null_pointer
     ok 6 - error_pointer
     ok 7 - invalid_pointer
     ok 8 - symbol_ptr
     ok 9 - kernel_ptr
     ok 10 - struct_resource
     ok 11 - addr
     ok 12 - escaped_str
     ok 13 - hex_string
     ok 14 - mac
     # ip: EXPECTATION FAILED at lib/printf_kunit.c:82
 vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote '127.000.000.001|127.0.0.1', 
 expected '127-000.000.001|127.0.0.1'
     # ip: EXPECTATION FAILED at lib/printf_kunit.c:124
 kvasprintf(..., "%pi4|%pI4", ...) returned '127.000.000.001|127.0.0.1', 
 expected '127-000.000.001|127.0.0.1'
     not ok 15 - ip
     ok 16 - uuid
     ok 17 - dentry
     ok 18 - struct_va_format
     ok 19 - time_and_date
     ok 20 - struct_clk
     ok 21 - bitmap
     ok 22 - netdev_features
     ok 23 - flags
     ok 24 - errptr
     ok 25 - fwnode_pointer
 not ok 1 - printf
>>>
>>> Better, indeed.
>>>
>>> But can be this improved to have a cumulative statistics, like showing only
>>> number of total, succeeded, failed with details of the latter ones?
>>
>> Is that the proper test output format?  We have a standard...

Actually more like xkcd#927.

> 
> What is the standard, please?
> 
> The original module produced:
> 
> [   48.577739] test_printf: loaded.
> [   48.578046] test_printf: all 388 tests passed
> 
> It comes from KSTM_MODULE_LOADERS() macro that has been created
> by the commit eebf4dd452377921e3a26 ("kselftest: Add test module
> framework header"). 

That's a bit of a simplification. That code was clearly lifted directly
from the original test_printf.c code
(707cc7280f452a162c52bc240eae62568b9753c2). test_bitmap.c cloned
test_printf.c (including a "Test cases for printf facility" comment...).
Later both were converted to use the KSTM header.

As the one coming up with that originally, I certainly endorse its use
as a standard way of producing a final, free-

Re: [EXT] Re: [v2 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt

2020-11-03 Thread Rasmus Villemoes
On 02/11/2020 22.22, Leo Li wrote:
>>>
>>> Where did you get this information that the register on LS1043 and
>>> LS1046 is bit reversed?  I cannot find such information in the RM.
>>> And does this mean all other SCFG registers are also bit reversed?  If
>>> this is some information that is not covered by the RM, we probably
>>> should clarify it in the code and the commit message.
>> Hi Leo,
>>
>> I directly use the same logic to write the bit(field IRQ0~11INTP) of the
>> register SCFG_INTPCR in LS1043A and LS1046A.
>> Such as,
>> if I want to control the polarity of IRQ0(field IRQ0INTP, IRQ0 is active 
>> low) of
>> LS1043A/LS1046A, then I just need write a value 1 << (31 - 0) to it.
>> The logic depends on register's definition in LS1043A/LS1046A's RM.
> 
> Ok.  The SCFG_SCFGREVCR seems to be a one-off fixup only existed on LS1021.  
> And it is mandatory to be bit_reversed according to the RM which is already 
> taken care of in the RCW.  So the bit reversed case should be the only case 
> supported otherwise a lot of other places for SCFG access should be failed.
> 
> I think we should remove the bit_reverse thing all together from the driver 
> for good.  This will prevent future confusion.  Rasmus, what do you think?

Yes, all the ls1021a-derived boards I know of do have something like

# Initialize bit reverse of SCFG registers
09570200 

in their pre-boot-loader config file. And yes, the RM does say

  This register must be written 0x_ as a part of
  initialization sequence before writing to any other SCFG
  register.

but nowhere does it say "or else...", nor a little honest addendum
"because we accidentally released broken silicon with this misfeature
_and_ wrong POR value".

Can we have an official statement from NXP stating that SCFGREVCR is a
hardware design bug? And can you send it through a time-machine so I had
it three years ago avoiding the whole "fsl,bit-reverse
device-tree-property, no, read the register if you're on a ls1021a and
decide" hullabaloo.

Rasmus


Re: [PATCH 0/4] deterministic random testing

2020-10-30 Thread Rasmus Villemoes
On 26/10/2020 11.59, Andy Shevchenko wrote:
> On Sun, Oct 25, 2020 at 10:48:38PM +0100, Rasmus Villemoes wrote:
>> This is a bit of a mixed bag.
>>
>> The background is that I have some sort() and list_sort() rework
>> planned, but as part of that series I want to extend their their test
>> suites somewhat to make sure I don't goof up - and I want to use lots
>> of random list lengths with random contents to increase the chance of
>> somebody eventually hitting "hey, sort() is broken when the length is
>> 3 less than a power of 2 and only the last two elements are out of
>> order". But when such a case is hit, it's vitally important that the
>> developer can reproduce the exact same test case, which means using a
>> deterministic sequence of random numbers.
>>
>> Since Petr noticed [1] the non-determinism in test_printf in
>> connection with Arpitha's work on rewriting it to kunit, this prompted
>> me to use test_printf as a first place to apply that principle, and
>> get the infrastructure in place that will avoid repeating the "module
>> parameter/seed the rnd_state/report the seed used" boilerplate in each
>> module.
>>
>> Shuah, assuming the kselftest_module.h changes are ok, I think it's
>> most natural if you carry these patches, though I'd be happy with any
>> other route as well.
> 
> Completely in favour of this.
> 
> Reviewed-by: Andy Shevchenko 

Thanks.

> One note though. AFAIU the global variables are always being used in the
> modules that include the corresponding header. Otherwise we might have an 
> extra
> warning(s). I believe you have compiled with W=1 to exclude other cases.

Yes, I unconditionally define the two new variables. gcc doesn't warn
about them being unused, since they are referenced from inside a

  if (0) {}

block. And when those references are the only ones, gcc is smart enough
to elide the static variables completely, so they don't even take up
space in .data (or .init.data) - you can verify by running nm on
test_printf.o and test_bitmap.o - the former has 'seed' and 'rnd_state'
symbols, the latter does not.

I did it that way to reduce the need for explicit preprocessor
conditionals inside C functions.

Rasmus


Re: [PATCH] seqlock: avoid -Wshadow warnings

2020-10-28 Thread Rasmus Villemoes
On 28/10/2020 00.34, Arnd Bergmann wrote:
> On Mon, Oct 26, 2020 at 5:58 PM Peter Zijlstra  wrote:
>>
>> On Mon, Oct 26, 2020 at 05:50:38PM +0100, Arnd Bergmann wrote:
>>
>>> - unsigned seq;   \
>>> + unsigned __seq; \
>>
>>> - unsigned seq = __read_seqcount_begin(s);\
>>> + unsigned _seq = __read_seqcount_begin(s);   \
>>
>>> - unsigned seq = __seqcount_sequence(s);  \
>>> + unsigned __seq = __seqcount_sequence(s);\
>>
>> Can we have a consistent number of leading '_' ?
> 
> Not really ;-)
> 
> The warning comes from raw_read_seqcount_begin() calling
> __read_seqcount_begin() and both using the same variable
> name. I could rename one of them  and use double-underscores
> for both, but I haven't come up with a good alternative name
> that wouldn't make it less consistent in the process.

At least x86's put_user and get_user use _pu/_gu suffixes on their local
variables, so perhaps that could be made a weak default convention?

__seq_rsb
__seq_rrsb
__seq_rrs

Hm, or perhaps not. But it's still better than triplicating each macro
to do a UNIQUE_ID dance.

Rasmus


Re: [PATCH] printf: fix Woverride-init warning for EDEADLK errno

2020-10-27 Thread Rasmus Villemoes
On 27/10/2020 10.12, Petr Mladek wrote:
> On Tue 2020-10-27 09:46:28, Arnd Bergmann wrote:
>> On Tue, Oct 27, 2020 at 8:23 AM Rasmus Villemoes
>>  wrote:
>>> On 26/10/2020 22.49, Arnd Bergmann wrote:
>>>> From: Arnd Bergmann 
>>>
>>> NAK. That would end up using the "EDEADLOCK" string for the value 35 on
>>> those architectures where they are the same, despite EDEADLK being the
>>> by far the most used symbol. See the comments and original commit log,
>>> the placement of these is deliberate.
> 
> Good point.
> 
>> Ok, I see.
>>
>>> How about we do this instead?
>>>
>>> when building with W=1. As the use of multiple initializers for the
>>> same entry here is quite deliberate, explicitly disable that warning
>>> for errname.o.
>>>
>>> diff --git a/lib/Makefile b/lib/Makefile
>>> index ce45af50983a2a5e3582..a98119519e100103818d 100644
>>> --- a/lib/Makefile
>>> +++ b/lib/Makefile
>>> @@ -224,6 +224,7 @@ obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
>>>
>>>  obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
>>>  obj-$(CONFIG_SYMBOLIC_ERRNAME) += errname.o
>>> +CFLAGS_errname.o += $(call cc-disable-warning, override-init)
>>>
>>
>> This works, but it disables a potentially useful warning in case we
>> get another conflict in this file, so I'd prefer to find a way to
>> avoid the warning rather than force-disabling it.
> 
> Yeah, I think that it is better to explicitely disable the less used
> variant in the code than hiding the double initialization. It will
> be clear what is going on.
> 
> 
>> How about adding the #ifdef around the EDEADLOCK line
>> instead of the EDEADLK one? Something like
>>
>> diff --git a/lib/errname.c b/lib/errname.c
>> index 0c4d3e66170e..93043fb960cc 100644
>> --- a/lib/errname.c
>> +++ b/lib/errname.c
>> @@ -38,7 +38,10 @@ static const char *names_0[] = {
>> E(ECOMM),
>> E(ECONNABORTED),
>> E(ECONNRESET),
>> +   E(EDEADLK), /* EDEADLOCK */
>> +#if EDEADLK != EDEADLOCK /* mips, sparc, powerpc */
>> E(EDEADLOCK),
>> +#endif
>> E(EDESTADDRREQ),
>> E(EDOM),
>> E(EDOTDOT),
>> @@ -169,7 +172,6 @@ static const char *names_0[] = {
>> E(ECANCELED), /* ECANCELLED */
>> E(EAGAIN), /* EWOULDBLOCK */
>> E(ECONNREFUSED), /* EREFUSED */
>> -   E(EDEADLK), /* EDEADLOCK */
> 
> This should stay :-)
> 

No, Arnd moved it next to EDEADLOCK, which is fine (it can lose the
comment /* EDEADLOCK */, though; the comment on the ifdef is
sufficient). Especially when:

> And we should remove the ECANCELLED definition. It is always the same
> as ECANCELED and replaced. We do not define EWOULDBLOCK and
> EREFUSED either.

Yes, I'm not sure why I elided EWOULDBLOCK and EREFUSED but not
ECANCELLED. So let's move EAGAIN, ECONNREFUSED and ECANCELED to their
proper alphabetic place. But I also want to add a check that the things
we've elided match some value that we do handle. So add something like

#ifdef EREFUSED /* parisc */
static_assert(EREFUSED == ECONNREFUSED);
#endif

#ifdef ECANCELLED /* parisc */
static_assert(ECANCELLED == ECANCELED);
#endif

static_assert(EAGAIN == EWOULDBLOCK); /* everywhere */

so that if we ever import some arch that defines EREFUSED to something
other than ECONNREFUSED, it would be caught. Essentially, errname.c
should mention every #define E* that appears in any errno*.h.

Rasmus


Re: [v2 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt

2020-10-27 Thread Rasmus Villemoes
On 27/10/2020 05.46, Biwen Li wrote:
> From: Hou Zhiqiang 
> 
> Add an new IRQ chip declaration for LS1043A and LS1088A
> - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A. SCFG_INTPCR[31:0]
>   of these SoCs is stored/read as SCFG_INTPCR[0:31] defaultly(bit
>   reverse)

s/defaultly/by default/ I suppose. But what does that mean? Is it still
configurable, just now through some undocumented register? If that
register still exists, does it now have a reset value of all-ones as
opposed to the ls1021 case? If it's not configurable, then describing
the situation as "by default" is confusing and wrong, it should just say
"On LS1043A, LS1046A, SCFG_INTPCR is stored/read bit-reversed."


> - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA
> 
> Signed-off-by: Hou Zhiqiang 
> Signed-off-by: Biwen Li 
> ---
> Change in v2:
>   - add despcription of bit reverse
>   - update copyright
> 
>  drivers/irqchip/irq-ls-extirq.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-ls-extirq.c b/drivers/irqchip/irq-ls-extirq.c
> index 4d1179fed77c..9587bc2607fc 100644
> --- a/drivers/irqchip/irq-ls-extirq.c
> +++ b/drivers/irqchip/irq-ls-extirq.c
> @@ -1,5 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0
> -
> +/*
> + * Author: Rasmus Villemoes 

If I wanted my name splattered all over the files I touch or add, I'd
add it myself, TYVM. The git history is plenty fine for recording
authorship as far as I'm concerned, and I absolutely abhor having to
skip over any kind of legalese boilerplate when opening a file.

Rasmus


Re: [PATCH] printf: fix Woverride-init warning for EDEADLK errno

2020-10-27 Thread Rasmus Villemoes
On 27/10/2020 08.23, Rasmus Villemoes wrote:

> How about we do this instead?
> 
> From: Rasmus Villemoes 

Missed:

Subject: lib/errname.o: disable -Woverride-init

> 
> The table of errno value->name contains a few duplicate entries since
> e.g. EDEADLK == EDEADLOCK on most architectures. For the known cases,
> the most used symbolic constant is listed last so that takes
> precedence - the idea being that if someone sees "can't do that:
> -EDEADLK" in dmesg, grepping for EDEADLK is more likely to find the
> place where that error was generated (grepping for "can't do that"
> will find the printk() that emitted it, but the source would often be
> a few calls down).
> 
> However, that means one gets
> 
>   warning: initialized field overwritten [-Woverride-init]
> 
> when building with W=1. As the use of multiple initializers for the
> same entry here is quite deliberate, explicitly disable that warning
> for errname.o.
> 
> Reported-by: Arnd Bergmann 
> Fixes: 57f5677e535b ("printf: add support for printing symbolic error
> names")
> Signed-off-by: Rasmus Villemoes 
> ---
>  lib/Makefile | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index ce45af50983a2a5e3582..a98119519e100103818d 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -224,6 +224,7 @@ obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> 
>  obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
>  obj-$(CONFIG_SYMBOLIC_ERRNAME) += errname.o
> +CFLAGS_errname.o += $(call cc-disable-warning, override-init)
> 
>  obj-$(CONFIG_NLATTR) += nlattr.o
> 


-- 
Rasmus Villemoes
Software Developer
Prevas A/S
Hedeager 3
DK-8200 Aarhus N
+45 51210274
rasmus.villem...@prevas.dk
www.prevas.dk


Re: [PATCH] printf: fix Woverride-init warning for EDEADLK errno

2020-10-27 Thread Rasmus Villemoes
On 26/10/2020 22.49, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> On most architectures, gcc -Wextra warns about the list of error
> numbers containing both EDEADLK and EDEADLOCK:
> 
> lib/errname.c:15:67: warning: initialized field overwritten [-Woverride-init]
>15 | #define E(err) [err + BUILD_BUG_ON_ZERO(err <= 0 || err > 300)] = "-" 
> #err
>   |   ^~~
> lib/errname.c:172:2: note: in expansion of macro 'E'
>   172 |  E(EDEADLK), /* EDEADLOCK */
>   |  ^
> lib/errname.c:15:67: note: (near initialization for 'names_0[35]')
>15 | #define E(err) [err + BUILD_BUG_ON_ZERO(err <= 0 || err > 300)] = "-" 
> #err
>   |   ^~~
> lib/errname.c:172:2: note: in expansion of macro 'E'
>   172 |  E(EDEADLK), /* EDEADLOCK */
>   |  ^
> 
> Make that line conditional on the two values being distinct.
> 

NAK. That would end up using the "EDEADLOCK" string for the value 35 on
those architectures where they are the same, despite EDEADLK being the
by far the most used symbol. See the comments and original commit log,
the placement of these is deliberate.

How about we do this instead?

From: Rasmus Villemoes 

The table of errno value->name contains a few duplicate entries since
e.g. EDEADLK == EDEADLOCK on most architectures. For the known cases,
the most used symbolic constant is listed last so that takes
precedence - the idea being that if someone sees "can't do that:
-EDEADLK" in dmesg, grepping for EDEADLK is more likely to find the
place where that error was generated (grepping for "can't do that"
will find the printk() that emitted it, but the source would often be
a few calls down).

However, that means one gets

  warning: initialized field overwritten [-Woverride-init]

when building with W=1. As the use of multiple initializers for the
same entry here is quite deliberate, explicitly disable that warning
for errname.o.

Reported-by: Arnd Bergmann 
Fixes: 57f5677e535b ("printf: add support for printing symbolic error
names")
Signed-off-by: Rasmus Villemoes 
---
 lib/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/Makefile b/lib/Makefile
index ce45af50983a2a5e3582..a98119519e100103818d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -224,6 +224,7 @@ obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o

 obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o
 obj-$(CONFIG_SYMBOLIC_ERRNAME) += errname.o
+CFLAGS_errname.o += $(call cc-disable-warning, override-init)

 obj-$(CONFIG_NLATTR) += nlattr.o

-- 
2.23.0



Re: [RESEND 01/11] irqchip: ls-extirq: Add LS1043A, LS1088A external interrupt

2020-10-26 Thread Rasmus Villemoes
On 26/10/2020 09.44, Marc Zyngier wrote:
> On 2020-10-26 08:01, Biwen Li wrote:
>> From: Hou Zhiqiang 
>>
>> Add an new IRQ chip declaration for LS1043A and LS1088A
>> - compatible "fsl,ls1043a-extirq" for LS1043A, LS1046A
>> - compatible "fsl,ls1088a-extirq" for LS1088A, LS208xA, LX216xA
> 
> Three things:
> - This commit message doesn't describe the bit_reverse change

Yeah, please elaborate on that, as the RM for 1043 or 1046 doesn't
mention anything about bit reversal for the scfg registers - they don't
seem to have the utter nonsense that is SCFG_SCFGREVCR, but perhaps,
instead of removing it, that has just become a hard-coded part of the IP.

Also, IANAL etc., but

>> +// Copyright 2019-2020 NXP

really? Seems to be a bit of a stretch.

At the very least, cc'ing the original author and only person to ever
touch that file would have been appreciated.

Rasmus


[PATCH 2/4] kselftest_module.h: unconditionally expand the KSTM_MODULE_GLOBALS() macro

2020-10-25 Thread Rasmus Villemoes
Two out of three users of the kselftest_module.h header
manually define the failed_tests/total_tests variables instead of
making use of the KSTM_MODULE_GLOBALS() macro. However, instead of
just replacing those definitions with an invocation of that macro,
just unconditionally define them in the header file itself.

A coming change will add a few more global variables, and at least one
of those will be referenced from kstm_report() - however, that's not
possible currently, since when the definition is postponed until the
test module invokes KSTM_MODULE_GLOBALS(), the variable is not defined
by the time the compiler parses kstm_report().

Signed-off-by: Rasmus Villemoes 
---
 Documentation/dev-tools/kselftest.rst  | 2 --
 lib/test_bitmap.c  | 3 ---
 lib/test_printf.c  | 2 --
 lib/test_strscpy.c | 2 --
 tools/testing/selftests/kselftest_module.h | 5 ++---
 5 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/Documentation/dev-tools/kselftest.rst 
b/Documentation/dev-tools/kselftest.rst
index a901def730d95ca4c2c1..9899e86ed470ae527fdc 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -281,8 +281,6 @@ A bare bones test module might look like this:
 
#include "../tools/testing/selftests/kselftest/module.h"
 
-   KSTM_MODULE_GLOBALS();
-
/*
 * Kernel module for testing the foobinator
 */
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 4425a1dd4ef1c7d85973..02fc667a9b3d5d7de7eb 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -16,9 +16,6 @@
 
 #include "../tools/testing/selftests/kselftest_module.h"
 
-static unsigned total_tests __initdata;
-static unsigned failed_tests __initdata;
-
 static char pbl_buffer[PAGE_SIZE] __initdata;
 
 static const unsigned long exp1[] __initconst = {
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 7ac87f18a10ff8209ad5..1ed4a27390cb621715ab 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -30,8 +30,6 @@
 #define PAD_SIZE 16
 #define FILL_CHAR '$'
 
-static unsigned total_tests __initdata;
-static unsigned failed_tests __initdata;
 static char *test_buffer __initdata;
 static char *alloced_buffer __initdata;
 
diff --git a/lib/test_strscpy.c b/lib/test_strscpy.c
index a827f94601f5d945b163..be477a52d87185ee6a01 100644
--- a/lib/test_strscpy.c
+++ b/lib/test_strscpy.c
@@ -10,8 +10,6 @@
  * Kernel module for testing 'strscpy' family of functions.
  */
 
-KSTM_MODULE_GLOBALS();
-
 /*
  * tc() - Run a specific test case.
  * @src: Source string, argument to strscpy_pad()
diff --git a/tools/testing/selftests/kselftest_module.h 
b/tools/testing/selftests/kselftest_module.h
index e8eafaf0941aa716d9dc..c81c0b0c054befaf665b 100644
--- a/tools/testing/selftests/kselftest_module.h
+++ b/tools/testing/selftests/kselftest_module.h
@@ -9,9 +9,8 @@
  * See Documentation/dev-tools/kselftest.rst for an example test module.
  */
 
-#define KSTM_MODULE_GLOBALS()  \
-static unsigned int total_tests __initdata;\
-static unsigned int failed_tests __initdata
+static unsigned int total_tests __initdata;
+static unsigned int failed_tests __initdata;
 
 #define KSTM_CHECK_ZERO(x) do {
\
total_tests++;  \
-- 
2.23.0



[PATCH 1/4] prandom.h: add *_state variant of prandom_u32_max

2020-10-25 Thread Rasmus Villemoes
It is useful for test modules that make use of random numbers to allow
the exact same series of test cases to be repeated (e.g., after fixing
a bug in the code being tested). For that, the test module needs to
obtain its random numbers from a private state that can be seeded by a
known seed, e.g. given as a module parameter (and using a random seed
when that parameter is not given).

There's a few test modules I'm going to modify to follow that
scheme. As preparation, add a _state variant of the existing
prandom_u32_max(), and for convenience, also add a variant that
produces a value in a given range.

Signed-off-by: Rasmus Villemoes 
---
 include/linux/prandom.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e6468f91e79e1f31..58ffcd56c705be34fb98 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -46,6 +46,35 @@ static inline u32 prandom_u32_max(u32 ep_ro)
return (u32)(((u64) prandom_u32() * ep_ro) >> 32);
 }
 
+/**
+ * prandom_u32_max_state - get pseudo-random number in internal [0, hi)
+ *
+ * Like prandom_u32_max, but use the given state structure.
+ * @state: pointer to state structure
+ * @hi: (exclusive) upper bound
+ *
+ * Exception: If @hi == 0, this returns 0.
+ */
+static inline u32 prandom_u32_max_state(struct rnd_state *state, u32 hi)
+{
+   return ((u64)prandom_u32_state(state) * hi) >> 32;
+}
+
+/**
+ * prandom_u32_range_state - get pseudo-random number in internal [lo, hi)
+ *
+ * @state: pointer to state structure
+ * @lo: (inclusive) lower bound
+ * @hi: (exclusive) upper bound
+ *
+ * Exception: If @lo == @hi, this returns @lo. Results are unspecified
+ * for @lo > @hi.
+ */
+static inline u32 prandom_u32_range_state(struct rnd_state *state, u32 lo, u32 
hi)
+{
+   return lo + prandom_u32_max_state(state, hi - lo);
+}
+
 /*
  * Handle minimum values for seeds
  */
-- 
2.23.0



[PATCH 3/4] kselftest_module.h: add struct rnd_state and seed parameter

2020-10-25 Thread Rasmus Villemoes
Some test suites make use of random numbers to increase the test
coverage when the test suite gets run on different machines and
increase the chance of some corner case bug being discovered - and I'm
planning on extending some existing ones in that direction as
well. However, should a bug be found this way, it's important that the
exact same series of tests can be repeated to verify the bug is
fixed. That means the random numbers must be obtained
deterministically from a generator private to the test module.

To avoid adding boilerplate to various test modules, put some logic
into kselftest_module.h: If the module declares that it will use
random numbers, add a "seed" module parameter. If not explicitly given
when the module is loaded (or via kernel command line), obtain a
random one. In either case, print the seed used, and repeat that
information if there was at least one test failing.

Signed-off-by: Rasmus Villemoes 
---
 tools/testing/selftests/kselftest_module.h | 35 --
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kselftest_module.h 
b/tools/testing/selftests/kselftest_module.h
index c81c0b0c054befaf665b..43f3ca58fcd550b8ac83 100644
--- a/tools/testing/selftests/kselftest_module.h
+++ b/tools/testing/selftests/kselftest_module.h
@@ -3,14 +3,31 @@
 #define __KSELFTEST_MODULE_H
 
 #include 
+#include 
+#include 
 
 /*
  * Test framework for writing test modules to be loaded by kselftest.
  * See Documentation/dev-tools/kselftest.rst for an example test module.
  */
 
+/*
+ * If the test module makes use of random numbers, define KSTM_RANDOM
+ * to 1 before including this header. Then a module parameter "seed"
+ * will be defined. If not given, a random one will be obtained. In
+ * either case, the used seed is reported, so the exact same series of
+ * tests can be repeated by loading the module with that seed
+ * given.
+ */
+
+#ifndef KSTM_RANDOM
+#define KSTM_RANDOM 0
+#endif
+
 static unsigned int total_tests __initdata;
 static unsigned int failed_tests __initdata;
+static struct rnd_state rnd_state __initdata;
+static u64 seed __initdata;
 
 #define KSTM_CHECK_ZERO(x) do {
\
total_tests++;  \
@@ -22,11 +39,13 @@ static unsigned int failed_tests __initdata;
 
 static inline int kstm_report(unsigned int total_tests, unsigned int 
failed_tests)
 {
-   if (failed_tests == 0)
+   if (failed_tests == 0) {
pr_info("all %u tests passed\n", total_tests);
-   else
+   } else {
pr_warn("failed %u out of %u tests\n", failed_tests, 
total_tests);
-
+   if (KSTM_RANDOM)
+   pr_info("random seed used was 0x%016llx\n", seed);
+   }
return failed_tests ? -EINVAL : 0;
 }
 
@@ -34,6 +53,12 @@ static inline int kstm_report(unsigned int total_tests, 
unsigned int failed_test
 static int __init __module##_init(void)\
 {  \
pr_info("loaded.\n");   \
+   if (KSTM_RANDOM) {  \
+   if (!seed)  \
+   seed = get_random_u64();\
+   prandom_seed_state(&rnd_state, seed);   \
+   pr_info("random seed = 0x%016llx\n", seed); \
+   }   \
selftest(); \
return kstm_report(total_tests, failed_tests);  \
 }  \
@@ -44,4 +69,8 @@ static void __exit __module##_exit(void)  \
 module_init(__module##_init);  \
 module_exit(__module##_exit)
 
+#if KSTM_RANDOM
+module_param(seed, ullong, 0444);
+#endif
+
 #endif /* __KSELFTEST_MODULE_H */
-- 
2.23.0



[PATCH 4/4] lib/test_printf.c: use deterministic sequence of random numbers

2020-10-25 Thread Rasmus Villemoes
The printf test suite does each test with a few different buffer sizes
to ensure vsnprintf() behaves correctly with respect to truncation and
size reporting. It calls vsnprintf() with a buffer size that is
guaranteed to be big enough, a buffer size of 0 to ensure that nothing
gets written to the buffer, but it also calls vsnprintf() with a
buffer size chosen to guarantee the output gets truncated somewhere in
the middle.

That buffer size is chosen randomly to increase the chance of finding
some corner case bug (for example, there used to be some %p
extension that would fail to produce any output if there wasn't room
enough for it all, despite the requirement of producing as much as
there's room for). I'm not aware of that having found anything yet,
but should it happen, it's annoying not to be able to repeat the
test with the same sequence of truncated lengths.

For demonstration purposes, if we break one of the test cases
deliberately, we still get different buffer sizes if we don't pass the
seed parameter:

root@(none):/# modprobe test_printf
[   15.317783] test_printf: vsnprintf(buf, 18, "%piS|%pIS", ...) wrote 
'127.000.000.001|1', expected '127-000.000.001|1'
[   15.323182] test_printf: failed 3 out of 388 tests
[   15.324034] test_printf: random seed used was 0x278bb9311979cc91
modprobe: ERROR: could not insert 'test_printf': Invalid argument

root@(none):/# modprobe test_printf
[   13.940909] test_printf: vsnprintf(buf, 22, "%piS|%pIS", ...) wrote 
'127.000.000.001|127.0', expected '127-000.000.001|127.0'
[   13.944744] test_printf: failed 3 out of 388 tests
[   13.945607] test_printf: random seed used was 0x9f72eee1c9dc02e5
modprobe: ERROR: could not insert 'test_printf': Invalid argument

but to repeat a specific sequence of tests, we can do

root@(none):/# modprobe test_printf seed=0x9f72eee1c9dc02e5
[  448.328685] test_printf: vsnprintf(buf, 22, "%piS|%pIS", ...) wrote 
'127.000.000.001|127.0', expected '127-000.000.001|127.0'
[  448.331650] test_printf: failed 3 out of 388 tests
[  448.332295] test_printf: random seed used was 0x9f72eee1c9dc02e5
modprobe: ERROR: could not insert 'test_printf': Invalid argument

Signed-off-by: Rasmus Villemoes 
---
 lib/test_printf.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/test_printf.c b/lib/test_printf.c
index 1ed4a27390cb621715ab..bbea8b807d1eafe67e01 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -24,6 +24,7 @@
 
 #include 
 
+#define KSTM_RANDOM 1
 #include "../tools/testing/selftests/kselftest_module.h"
 
 #define BUF_SIZE 256
@@ -111,8 +112,14 @@ __test(const char *expect, int elen, const char *fmt, ...)
 * be able to print it as expected.
 */
failed_tests += do_test(BUF_SIZE, expect, elen, fmt, ap);
-   rand = 1 + prandom_u32_max(elen+1);
-   /* Since elen < BUF_SIZE, we have 1 <= rand <= BUF_SIZE. */
+   rand = prandom_u32_range_state(&rnd_state, 1, elen + 1);
+   /*
+* Except for elen == 0, we have 1 <= rand <= elen < BUF_SIZE,
+* i.e., the output is guaranteed to be truncated somewhere in
+* the middle, and we're not pretending the buffer to be
+* larger than it really is. For the boring case of elen == 0,
+* rand is 1, which is of course also <= BUF_SIZE.
+*/
failed_tests += do_test(rand, expect, elen, fmt, ap);
failed_tests += do_test(0, expect, elen, fmt, ap);
 
-- 
2.23.0



[PATCH 0/4] deterministic random testing

2020-10-25 Thread Rasmus Villemoes
This is a bit of a mixed bag.

The background is that I have some sort() and list_sort() rework
planned, but as part of that series I want to extend their their test
suites somewhat to make sure I don't goof up - and I want to use lots
of random list lengths with random contents to increase the chance of
somebody eventually hitting "hey, sort() is broken when the length is
3 less than a power of 2 and only the last two elements are out of
order". But when such a case is hit, it's vitally important that the
developer can reproduce the exact same test case, which means using a
deterministic sequence of random numbers.

Since Petr noticed [1] the non-determinism in test_printf in
connection with Arpitha's work on rewriting it to kunit, this prompted
me to use test_printf as a first place to apply that principle, and
get the infrastructure in place that will avoid repeating the "module
parameter/seed the rnd_state/report the seed used" boilerplate in each
module.

Shuah, assuming the kselftest_module.h changes are ok, I think it's
most natural if you carry these patches, though I'd be happy with any
other route as well.

[1] https://lore.kernel.org/lkml/20200821113710.GA26290@alley/


Rasmus Villemoes (4):
  prandom.h: add *_state variant of prandom_u32_max
  kselftest_module.h: unconditionally expand the KSTM_MODULE_GLOBALS()
macro
  kselftest_module.h: add struct rnd_state and seed parameter
  lib/test_printf.c: use deterministic sequence of random numbers

 Documentation/dev-tools/kselftest.rst  |  2 --
 include/linux/prandom.h| 29 
 lib/test_bitmap.c  |  3 --
 lib/test_printf.c  | 13 ---
 lib/test_strscpy.c |  2 --
 tools/testing/selftests/kselftest_module.h | 40 ++
 6 files changed, 72 insertions(+), 17 deletions(-)

-- 
2.23.0



[PATCH] watchdog: sbc_fitpc2_wdt: add __user annotations

2020-10-25 Thread Rasmus Villemoes
After a change to the put_user() macro on x86, kernel test robot has
started sending me complaints (from sparse) about passing kernel
pointers to put_user(). So add the __user annotations to the various
casts in fitpc2_wdt_ioctl(), and while in here, also make the write
method actually match the prototype of file_operations::write.

Reported-by: kernel test robot 
Signed-off-by: Rasmus Villemoes 
---
 drivers/watchdog/sbc_fitpc2_wdt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/watchdog/sbc_fitpc2_wdt.c 
b/drivers/watchdog/sbc_fitpc2_wdt.c
index 04483d6453d6a147703e..13db71e165836eb73249 100644
--- a/drivers/watchdog/sbc_fitpc2_wdt.c
+++ b/drivers/watchdog/sbc_fitpc2_wdt.c
@@ -78,7 +78,7 @@ static int fitpc2_wdt_open(struct inode *inode, struct file 
*file)
return stream_open(inode, file);
 }
 
-static ssize_t fitpc2_wdt_write(struct file *file, const char *data,
+static ssize_t fitpc2_wdt_write(struct file *file, const char __user *data,
size_t len, loff_t *ppos)
 {
size_t i;
@@ -125,16 +125,16 @@ static long fitpc2_wdt_ioctl(struct file *file, unsigned 
int cmd,
 
switch (cmd) {
case WDIOC_GETSUPPORT:
-   ret = copy_to_user((struct watchdog_info *)arg, &ident,
+   ret = copy_to_user((struct watchdog_info __user *)arg, &ident,
   sizeof(ident)) ? -EFAULT : 0;
break;
 
case WDIOC_GETSTATUS:
-   ret = put_user(0, (int *)arg);
+   ret = put_user(0, (int __user *)arg);
break;
 
case WDIOC_GETBOOTSTATUS:
-   ret = put_user(0, (int *)arg);
+   ret = put_user(0, (int __user *)arg);
break;
 
case WDIOC_KEEPALIVE:
@@ -143,7 +143,7 @@ static long fitpc2_wdt_ioctl(struct file *file, unsigned 
int cmd,
break;
 
case WDIOC_SETTIMEOUT:
-   ret = get_user(time, (int *)arg);
+   ret = get_user(time, (int __user *)arg);
if (ret)
break;
 
@@ -157,7 +157,7 @@ static long fitpc2_wdt_ioctl(struct file *file, unsigned 
int cmd,
fallthrough;
 
case WDIOC_GETTIMEOUT:
-   ret = put_user(margin, (int *)arg);
+   ret = put_user(margin, (int __user *)arg);
break;
}
 
-- 
2.23.0



[PATCH] kernel/sys.c: fix prototype of prctl_get_tid_address()

2020-10-23 Thread Rasmus Villemoes
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace". So
sparse rightfully complains about passing a kernel pointer to
put_user().

Reported-by: kernel test robot 
Signed-off-by: Rasmus Villemoes 
---
 kernel/sys.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 6401880dff74a80a4594..85395f5cebc34d073cf4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2238,12 +2238,12 @@ static int prctl_set_mm(int opt, unsigned long addr,
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
+static int prctl_get_tid_address(struct task_struct *me, int __user * __user 
*tid_addr)
 {
return put_user(me->clear_child_tid, tid_addr);
 }
 #else
-static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
+static int prctl_get_tid_address(struct task_struct *me, int __user * __user 
*tid_addr)
 {
return -EINVAL;
 }
@@ -2427,7 +2427,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
error = prctl_set_mm(arg2, arg3, arg4, arg5);
break;
case PR_GET_TID_ADDRESS:
-   error = prctl_get_tid_address(me, (int __user **)arg2);
+   error = prctl_get_tid_address(me, (int __user * __user *)arg2);
break;
case PR_SET_CHILD_SUBREAPER:
me->signal->is_child_subreaper = !!arg2;
-- 
2.23.0



[PATCH] drm/ttm: add __user annotation in radeon_ttm_vram_read

2020-10-23 Thread Rasmus Villemoes
Keep sparse happy by preserving the __user annotation when casting.

Reported-by: kernel test robot 
Signed-off-by: Rasmus Villemoes 
---

kernel test robot has already started spamming me due to 9c5743dff. If
I don't fix those warnings I'll keep getting those emails for
months, so let me do the easy ones.


 drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 36150b7f31a90aa1eece..ecfe88b0a35d8f317712 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -1005,7 +1005,7 @@ static ssize_t radeon_ttm_vram_read(struct file *f, char 
__user *buf,
value = RREG32(RADEON_MM_DATA);
spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
 
-   r = put_user(value, (uint32_t *)buf);
+   r = put_user(value, (uint32_t __user *)buf);
if (r)
return r;
 
-- 
2.23.0



[PATCH 4.9-stable] scripts/setlocalversion: make git describe output more reliable

2020-10-23 Thread Rasmus Villemoes
commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream.

When building for an embedded target using Yocto, we're sometimes
observing that the version string that gets built into vmlinux (and
thus what uname -a reports) differs from the path under /lib/modules/
where modules get installed in the rootfs, but only in the length of
the -gabc123def suffix. Hence modprobe always fails.

The problem is that Yocto has the concept of "sstate" (shared state),
which allows different developers/buildbots/etc. to share build
artifacts, based on a hash of all the metadata that went into building
that artifact - and that metadata includes all dependencies (e.g. the
compiler used etc.). That normally works quite well; usually a clean
build (without using any sstate cache) done by one developer ends up
being binary identical to a build done on another host. However, one
thing that can cause two developers to end up with different builds
[and thus make one's vmlinux package incompatible with the other's
kernel-dev package], which is not captured by the metadata hashing, is
this `git describe`: The output of that can be affected by

(1) git version: before 2.11 git defaulted to a minimum of 7, since
2.11 (git.git commit e6c587) the default is dynamic based on the
number of objects in the repo
(2) hence even if both run the same git version, the output can differ
based on how many remotes are being tracked (or just lots of local
development branches or plain old garbage)
(3) and of course somebody could have a core.abbrev config setting in
~/.gitconfig

So in order to avoid `uname -a` output relying on such random details
of the build environment which are rather hard to ensure are
consistent between developers and buildbots, make sure the abbreviated
sha1 always consists of exactly 12 hex characters. That is consistent
with the current rule for -stable patches, and is almost always enough
to identify the head commit unambigously - in the few cases where it
does not, the v5.4.3-00021- prefix would certainly nail it down.

[Adapt to `` vs $() differences between 4.9 and upstream.]

Signed-off-by: Rasmus Villemoes 
Signed-off-by: Masahiro Yamada 
---
 scripts/setlocalversion | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index aa28c3f29809314bfa58..0c8741b795d0c82a38c9 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -44,7 +44,7 @@ scm_version()
 
# Check for git and a git repo.
if test -z "$(git rev-parse --show-cdup 2>/dev/null)" &&
-  head=`git rev-parse --verify --short HEAD 2>/dev/null`; then
+  head=$(git rev-parse --verify HEAD 2>/dev/null); then
 
# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore
# it, because this version is defined in the top level Makefile.
@@ -58,11 +58,22 @@ scm_version()
fi
# If we are past a tagged commit (like
# "v2.6.30-rc5-302-g72357d5"), we pretty print it.
-   if atag="`git describe 2>/dev/null`"; then
-   echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),$(NF))}'
-
-   # If we don't have a tag at all we print -g{commitish}.
+   #
+   # Ensure the abbreviated sha1 has exactly 12
+   # hex characters, to make the output
+   # independent of git version, local
+   # core.abbrev settings and/or total number of
+   # objects in the current repository - passing
+   # --abbrev=12 ensures a minimum of 12, and the
+   # awk substr() then picks the 'g' and first 12
+   # hex chars.
+   if atag="$(git describe --abbrev=12 2>/dev/null)"; then
+   echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),substr($(NF),0,13))}'
+
+   # If we don't have a tag at all we print -g{commitish},
+   # again using exactly 12 hex chars.
else
+   head="$(echo $head | cut -c1-12)"
printf '%s%s' -g $head
fi
fi
-- 
2.23.0



[PATCH 4.4-stable] scripts/setlocalversion: make git describe output more reliable

2020-10-23 Thread Rasmus Villemoes
commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream.

When building for an embedded target using Yocto, we're sometimes
observing that the version string that gets built into vmlinux (and
thus what uname -a reports) differs from the path under /lib/modules/
where modules get installed in the rootfs, but only in the length of
the -gabc123def suffix. Hence modprobe always fails.

The problem is that Yocto has the concept of "sstate" (shared state),
which allows different developers/buildbots/etc. to share build
artifacts, based on a hash of all the metadata that went into building
that artifact - and that metadata includes all dependencies (e.g. the
compiler used etc.). That normally works quite well; usually a clean
build (without using any sstate cache) done by one developer ends up
being binary identical to a build done on another host. However, one
thing that can cause two developers to end up with different builds
[and thus make one's vmlinux package incompatible with the other's
kernel-dev package], which is not captured by the metadata hashing, is
this `git describe`: The output of that can be affected by

(1) git version: before 2.11 git defaulted to a minimum of 7, since
2.11 (git.git commit e6c587) the default is dynamic based on the
number of objects in the repo
(2) hence even if both run the same git version, the output can differ
based on how many remotes are being tracked (or just lots of local
development branches or plain old garbage)
(3) and of course somebody could have a core.abbrev config setting in
~/.gitconfig

So in order to avoid `uname -a` output relying on such random details
of the build environment which are rather hard to ensure are
consistent between developers and buildbots, make sure the abbreviated
sha1 always consists of exactly 12 hex characters. That is consistent
with the current rule for -stable patches, and is almost always enough
to identify the head commit unambigously - in the few cases where it
does not, the v5.4.3-00021- prefix would certainly nail it down.

[Adapt to `` vs $() differences between 4.4 and upstream.]

Signed-off-by: Rasmus Villemoes 
Signed-off-by: Masahiro Yamada 
---
 scripts/setlocalversion | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index aa28c3f29809314bfa58..0c8741b795d0c82a38c9 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -44,7 +44,7 @@ scm_version()
 
# Check for git and a git repo.
if test -z "$(git rev-parse --show-cdup 2>/dev/null)" &&
-  head=`git rev-parse --verify --short HEAD 2>/dev/null`; then
+  head=$(git rev-parse --verify HEAD 2>/dev/null); then
 
# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore
# it, because this version is defined in the top level Makefile.
@@ -58,11 +58,22 @@ scm_version()
fi
# If we are past a tagged commit (like
# "v2.6.30-rc5-302-g72357d5"), we pretty print it.
-   if atag="`git describe 2>/dev/null`"; then
-   echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),$(NF))}'
-
-   # If we don't have a tag at all we print -g{commitish}.
+   #
+   # Ensure the abbreviated sha1 has exactly 12
+   # hex characters, to make the output
+   # independent of git version, local
+   # core.abbrev settings and/or total number of
+   # objects in the current repository - passing
+   # --abbrev=12 ensures a minimum of 12, and the
+   # awk substr() then picks the 'g' and first 12
+   # hex chars.
+   if atag="$(git describe --abbrev=12 2>/dev/null)"; then
+   echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),substr($(NF),0,13))}'
+
+   # If we don't have a tag at all we print -g{commitish},
+   # again using exactly 12 hex chars.
else
+   head="$(echo $head | cut -c1-12)"
printf '%s%s' -g $head
fi
fi
-- 
2.23.0



[PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread Rasmus Villemoes
Quoting
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html:

  You can define a local register variable and associate it with a
  specified register...

  The only supported use for this feature is to specify registers for
  input and output operands when calling Extended asm (see Extended
  Asm). This may be necessary if the constraints for a particular
  machine don't provide sufficient control to select the desired
  register.

On 32-bit x86, this is used to ensure that gcc will put an 8-byte
value into the %edx:%eax pair, while all other cases will just use the
single register %eax (%rax on x86-64). While the _ASM_AX actually just
expands to "%eax", note this comment next to get_user() which does
something very similar:

 * The use of _ASM_DX as the register specifier is a bit of a
 * simplification, as gcc only cares about it as the starting point
 * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
 * (%ecx being the next register in gcc's x86 register sequence), and
 * %rdx on 64 bits.

However, getting this to work requires that there is no code between
the assignment to the local register variable and its use as an input
to the asm() which can possibly clobber any of the registers involved
- including evaluation of the expressions making up other inputs.

In the current code, the ptr expression used directly as an input may
cause such code to be emitted. For example, Sean Christopherson
observed that with KASAN enabled and ptr being
current->set_child_tid (from chedule_tail()), the load of
current->set_child_tid causes a call to __asan_load8() to be emitted
immediately prior to the __put_user_4 call, and Naresh Kamboju reports
that various mmstress tests fail on KASAN-enabled builds. It's also
possible to synthesize a broken case without KASAN if one uses "foo()"
as the ptr argument, with foo being some "extern u64
__user *foo(void);" (though I don't know if that appears in real
code).

Fix it by making sure ptr gets evaluated before the assignment to
__val_pu, and add a comment that __val_pu must be the last thing
computed before the asm() is entered.

Cc: Sean Christopherson 
Reported-by: Naresh Kamboju 
Tested-by: Naresh Kamboju 
Fixes: d55564cfc222 ("x86: Make __put_user() generate an out-of-line call")
Signed-off-by: Rasmus Villemoes 
---

I tried to put together a changelog to earn just a little
claim-to-fame, feel free to edit, especially if I got something
wrong. Did I miss any appropriate *-by tags?

I'm wondering if one would also need to make __ptr_pu and __ret_pu
explicitly "%"_ASM_CX".

 arch/x86/include/asm/uaccess.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index f13659523108..c9fa7be3df82 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -208,16 +208,24 @@ extern void __put_user_nocheck_2(void);
 extern void __put_user_nocheck_4(void);
 extern void __put_user_nocheck_8(void);
 
+/*
+ * ptr must be evaluated and assigned to the temporary __ptr_pu before
+ * the assignment of x to __val_pu, to avoid any function calls
+ * involved in the ptr expression (possibly implicitly generated due
+ * to KASAN) from clobbering %ax.
+ */
 #define do_put_user_call(fn,x,ptr) \
 ({ \
int __ret_pu;   \
+   void __user *__ptr_pu;  \
register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);   \
__chk_user_ptr(ptr);\
+   __ptr_pu = (ptr);   \
__val_pu = (x); \
asm volatile("call __" #fn "_%P[size]"  \
 : "=c" (__ret_pu), \
ASM_CALL_CONSTRAINT \
-: "0" (ptr),   \
+: "0" (__ptr_pu),  \
   "r" (__val_pu),  \
   [size] "i" (sizeof(*(ptr)))  \
 :"ebx");   \
-- 
2.23.0



Re: [PATCH v2] lib: Convert test_printf.c to KUnit

2020-10-23 Thread Rasmus Villemoes
On 22/10/2020 21.16, Andy Shevchenko wrote:
> On Thu, Oct 22, 2020 at 08:43:49PM +0530, Arpitha Raghunandan wrote:
>> Convert test lib/test_printf.c to KUnit. More information about
>> KUnit can be found at:
>> https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html.
>> KUnit provides a common framework for unit tests in the kernel.
>> KUnit and kselftest are standardizing around KTAP, converting this
>> test to KUnit makes this test output in KTAP which we are trying to
>> make the standard test result format for the kernel. More about
>> the KTAP format can be found at:
>> https://lore.kernel.org/linux-kselftest/cy4pr13mb1175b804e31e502221bc8163fd...@cy4pr13mb1175.namprd13.prod.outlook.com/.
>> I ran both the original and converted tests as is to produce the
>> output for success of the test in the two cases. I also ran these
>> tests with a small modification to show the difference in the output
>> for failure of the test in both cases. The modification I made is:
>> - test("127.000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, &sa.sin_addr);
>> + test("127-000.000.001|127.0.0.1", "%pi4|%pI4", &sa.sin_addr, &sa.sin_addr);
>>
>> Original test success:
>> [0.591262] test_printf: loaded.
>> [0.591409] test_printf: all 388 tests passed
>>
>> Original test failure:
>> [0.619345] test_printf: loaded.
>> [0.619394] test_printf: vsnprintf(buf, 256, "%piS|%pIS", ...)
>> wrote '127.000.000.001|127.0.0.1', expected
>> '127-000.000.001|127.0.0.1'
>> [0.619395] test_printf: vsnprintf(buf, 25, "%piS|%pIS", ...) wrote
>> '127.000.000.001|127.0.0.', expected '127-000.000.001|127.0.0.'
>> [0.619396] test_printf: kvasprintf(..., "%piS|%pIS", ...) returned
>> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
>> [0.619495] test_printf: failed 3 out of 388 tests
>>
>> Converted test success:
>> # Subtest: printf-kunit-test
>> 1..1
>> ok 1 - selftest
>> ok 1 - printf-kunit-test
>>
>> Converted test failure:
>> # Subtest: printf-kunit-test
>> 1..1
>> # selftest: EXPECTATION FAILED at lib/printf_kunit.c:82
>> vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote
>> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
>> # selftest: EXPECTATION FAILED at lib/printf_kunit.c:82
>> vsnprintf(buf, 5, "%pi4|%pI4", ...) wrote '127.', expected '127-'
>> # selftest: EXPECTATION FAILED at lib/printf_kunit.c:118
>> kvasprintf(..., "%pi4|%pI4", ...) returned
>> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
>> not ok 1 - selftest
>> not ok 1 - printf-kunit-test
> 
> Not bad. Rasmus, what do you think?

Much better, but that '1..1' and reporting the entire test suite as 1
single (failing or passing) test is (also) a regression. Look at the
original

>> [0.591409] test_printf: all 388 tests passed

or

>> [0.619495] test_printf: failed 3 out of 388 tests

That's far more informative, and I'd prefer if the summary information
(whether in the all-good case or some-failing) included something like
this. In particular, I have at some point spotted that I failed to
properly hook up a new test case (or perhaps failed to re-compile, or
somehow still ran the old kernel binary, don't remember which it was) by
noticing that the total number of tests hadn't increased. The new output
would not help catch such PEBKACs.

Rasmus


Re: [LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

2020-10-23 Thread Rasmus Villemoes
On 23/10/2020 07.02, Sean Christopherson wrote:
> On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
>> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz  wrote:
>>>
>>> The kernel Naresh originally referred to is here:
>>>   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
>>
>> Thanks.
>>
>> And when I started looking at it, I realized that my original idea
>> ("just look for __put_user_nocheck_X calls, there aren't so many of
>> those") was garbage, and that I was just being stupid.
>>
>> Yes, the commit that broke was about __put_user(), but in order to not
>> duplicate all the code, it re-used the regular put_user()
>> infrastructure, and so all the normal put_user() calls are potential
>> problem spots too if this is about the compiler interaction with KASAN
>> and the asm changes.
>>
>> So it's not just a couple of special cases to look at, it's all the
>> normal cases too.
>>
>> Ok, back to the drawing board, but I think reverting it is probably
>> the right thing to do if I can't think of something smart.
>>
>> That said, since you see this on x86-64, where the whole ugly trick with that
>>
>>register asm("%"_ASM_AX)
>>
>> is unnecessary (because the 8-byte case is still just a single
>> register, no %eax:%edx games needed), it would be interesting to hear
>> if the attached patch fixes it. That would confirm that the problem
>> really is due to some register allocation issue interaction (or,
>> alternatively, it would tell me that there's something else going on).
> 
> I haven't reproduced the crash, but I did find a smoking gun that confirms the
> "register shenanigans are evil shenanigans" theory.  I ran into a similar 
> thing
> recently where a seemingly innocuous line of code after loading a value into a
> register variable wreaked havoc because it clobbered the input register.
> 
> This put_user() in schedule_tail():
> 
>if (current->set_child_tid)
>put_user(task_pid_vnr(current), current->set_child_tid);
> 
> generates the following assembly with KASAN out-of-line:
> 
>0x810dccc9 <+73>: xor%edx,%edx
>0x810dcccb <+75>: xor%esi,%esi
>0x810dcccd <+77>: mov%rbp,%rdi
>0x810dccd0 <+80>: callq  0x810bf5e0 <__task_pid_nr_ns>
>0x810dccd5 <+85>: mov%r12,%rdi
>0x810dccd8 <+88>: callq  0x81388c60 <__asan_load8>
>0x810dccdd <+93>: mov0x590(%rbp),%rcx
>0x810dcce4 <+100>: callq  0x817708a0 <__put_user_4>
>0x810dcce9 <+105>: pop%rbx
>0x810dccea <+106>: pop%rbp
>0x810dcceb <+107>: pop%r12
> 
> __task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
> __asan_load8()'s check on current for the current->set_child_tid dereference.
> 

Yup, and you don't need KASAN to implicitly generate function calls for
you. With x86_64 defconfig, I get

extern u64 __user *get_destination(int x, int y);

void pu_test(void)
{
u64 big = 0x1234abcd5678;

if (put_user(big, get_destination(4, 5)))
pr_warn("uh");
}

to generate

4d60 :
4d60:   53  push   %rbx
4d61:   be 05 00 00 00  mov$0x5,%esi
4d66:   bf 04 00 00 00  mov$0x4,%edi
4d6b:   e8 00 00 00 00  callq  4d70 
4d6c: R_X86_64_PC32 get_destination-0x4
4d70:   48 89 c1mov%rax,%rcx
4d73:   e8 00 00 00 00  callq  4d78 
4d74: R_X86_64_PC32 __put_user_8-0x4
4d78:   85 c9   test   %ecx,%ecx
4d7a:   75 02   jne4d7e 
4d7c:   5b  pop%rbx
4d7d:   c3  retq
4d7e:   5b  pop%rbx
4d7f:   48 c7 c7 00 00 00 00mov$0x0,%rdi
4d82: R_X86_64_32S  .rodata.str1.1+0xfa
4d86:   e9 00 00 00 00  jmpq   4d8b 
4d87: R_X86_64_PC32 printk-0x4


That's certainly garbage. Now, I don't know if it's a sufficient fix (or
could break something else), but the obvious first step of rearranging
so that the ptr argument is evaluated before the assignment to __val_pu

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 477c503f2753..b5d3290fcd09 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -235,13 +235,13 @@ extern void __put_user_nocheck_8(void);
 #define do_put_user_call(fn,x,ptr) \
 ({ \
int __ret_pu;   \
-   register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX);   \
+   __typeof__(ptr) __ptr = (ptr);  \
+   register __typeof__(*(ptr)) __val_pu asm("%"_ASM_AX) =

[PATCH] Kbuild.include: remove leftover comment for filechk utility

2020-10-16 Thread Rasmus Villemoes
After commit 43fee2b23895 ("kbuild: do not redirect the first
prerequisite for filechk"), the rule is no longer automatically passed
$< as stdin, so remove the stale comment.

Fixes: 43fee2b23895 ("kbuild: do not redirect the first prerequisite for 
filechk")
Signed-off-by: Rasmus Villemoes 
---
 scripts/Kbuild.include | 2 --
 1 file changed, 2 deletions(-)

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 83a1637417e5..08e011175b4c 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -56,8 +56,6 @@ kecho := $($(quiet)kecho)
 # - If no file exist it is created
 # - If the content differ the new file is used
 # - If they are equal no change, and no timestamp update
-# - stdin is piped in from the first prerequisite ($<) so one has
-#   to specify a valid file as first prerequisite (often the kbuild file)
 define filechk
$(Q)set -e; \
mkdir -p $(dir $@); \
-- 
2.23.0



Re: [GIT PULL] printk for 5.10 (includes lockless ringbuffer)

2020-10-14 Thread Rasmus Villemoes
On 14/10/2020 16.16, Geert Uytterhoeven wrote:
> Hi Petr,
> 
> On Mon, Oct 12, 2020 at 4:50 PM Petr Mladek  wrote:
>> - Fully lockless ringbuffer implementation, including the support for
>>   continuous lines. It will allow to store and read messages in any
>>   situation wihtout the risk of deadlocks and without the need
>>   of temporary per-CPU buffers.
> 
> linux-m68k-atari_defconfig$ bloat-o-meter vmlinux.old
> vmlinux.lockless_ringbuffer
> add/remove: 39/16 grow/shrink: 9/15 up/down: 214075/-4362 (209713)
> Function old new   delta
> _printk_rb_static_infos-  180224 +180224
> _printk_rb_static_descs-   24576  +24576
> [...]
> 
> Seriously?!? Or am I being misled by the tools?
> 
> linux-m68k-atari_defconfig$ size vmlinux.old vmlinux.lockless_ringbuffer
>textdata bss dec hex filename
> 3559108 941716 12 4678596 4763c4 vmlinux.old
> 3563922 1152496 175276 4891694 4aa42e vmlinux.lockless_ringbuffer
> 
> Apparently not...

Hm, that's quite a lot. And the only reason the buffers don't live
entirely in .bss is because a few of their entries have non-zero
initializers.

Perhaps one could add a .init.text.initialize_static_data section of
function pointers, with the _DEFINE_PRINTKRB macro growing something like

static void __init __initialize_printkrb_##name(void) { \
  _##name##_descs[_DESCS_COUNT(descbits) - 1] = ...; \
  _##name##_infos[0] = ...; \
  _##name##_infos[_DESCS_COUNT(descbits) - 1] = ...; \
} \
static_data_initializer(__initialize_printkrb_##name);

with static_data_initalizer being the obvious yoga for putting a
function pointer in the .init.text.initialize_static_data section. Then
very early in start_kernel(), probably first thing, iterate that section
and call all the functions. But maybe that's not even early enough?

Rasmus


Re: using one gpio for multiple dht22 sensors

2020-10-14 Thread Rasmus Villemoes
On 14/10/2020 11.12, Peter Rosin wrote:
> Hi Rasnus,
> 
> On 2020-10-13 23:34, Rasmus Villemoes wrote:
>> Hi Peter
>>
>> I'm going to hook up a bunch of dht22 humidity (and temperature) sensors
>> [1] (drivers/iio/humidity/dht11.c), but partly due to limited number of
>> available gpios, I'm also going to use a 74hc4051 multiplexer [2], so
>> that all the dht22's actually sit on the same gpio.
>>
>> It's pretty obvious how the multiplexer is to be described in
>> device-tree (I'm probably going to send a patch to add support for an
>> optional "enable-gpio", as on the 74hc4051, so that MUX_IDLE_DISCONNECT
>> gets supported).
>>
>> It also seems like io-channel-mux should somehow magically apply to all
>> kinds of iio devices, but it can't be that simple. And if it is, I can't
>> figure out how to write the DT. So:
> 
> The io-channel-mux is for the situation where you have only one iio-device,
> and where you can control its environment with a mux. I.e. it is not about
> how the device communicates with the host. You then get one new "virtual"
> iio-device for every (valid) state of the mux, and when you access those
> iio-devices, the mux is configured to select the correct input/output for
> the iio-device in question. At least, it should be possible for output
> devices as well, but I guess you kind of get into trouble with the output
> signal not persisting when the mux changes state, but that is a problem
> for the HW people ;-). I originally used it for a single adc device where
> the mux simply selected what to measure.

I see, so it's not really applicable here.

>> - do I need to teach the dht11.c driver to be mux-aware?
>> - currently, a dht11/dht22 shows up in sysfs with just two files,
>> in_humidityrelative_input and in_temp_input. Now, should I end up with
>> one device and, say, 2*8 files (so that it seems like one sensor with
>> eight input channels), or should it show up as eight different devices?
>> If the latter, I guess the device tree would end up with the same "gpios
>> = " spec for all eight - is that even allowed?
> 
> It's not 100% clear to me how this is connected, but I'm guessing that you
> have the "DATA-signal" pin of the dht22s connected to the Y pins of the 4051,
> and that Z pin of the 4051 is connected to some gpio, so that you are able
> to communicate with the various dht22s by controlling the mux.

Exactly. And currently, I just have one dht22 in DT, listing the gpio
the Z-pin is connected to, and have tested that I can talk to all of
them by manipulating the mux from userspace (well, the driver thinks
there's only one dht22, so it imposes a 2 second delay before trying to
talk to "it" again - which is one reason I want to do this right).

> This calls for a mux on the single source of communication from the host
> to the various real devices that share this single source of communication.
> In other words, more like an i2c-mux than an io-channel-mux.
> 
> I.e., what you need is "the other kind" of mux/gpio driver, i.e. a driver
> that "fans out" a single gpio so that other drivers can treat the pins on
> the other side of the mux as gpios.

Hm, so I should create a new "gpio-controller" driver that presents each
of the Y pins as a separate gpio to other DT consumers? But while all
the devm_gpiod_get() would then succeed, only one of them could be
usable at a time. The "gpio-controller" driver would have to handle any
gpiod_xxx call by setting the mux appropriately, then forward the
gpiod_xxx call to the Z pin - but it can't really know when the consumer
is actually done with the gpio. And the consumer (dht11.c) has no way to
know that it should somehow "acquire" and "release" the gpiod when it's
not in use.

> I guess you'd have to sort out irq-routing through this new mux-gpio driver
> since the dht22 driver uses irqs to read input,

Well... I've had to monkeypatch the dht11 driver to do
local_irq_save()/read the gpio tightly in a loop until 80something edges
have been seen/local_irq_restore() to get stable readings (maybe I'll do
that as a real patch hanging off a module parameter - the dht22 "1-wire"
protocol is kinda crappy...), so that's not really a problem.

> 
> I'm not an expert on the intricacies of the gpio subsystem...
> 
> Does that help?

Somewhat. Thanks for the quick reply, I'll have to do some more digging
and experimentation.

Rasmus


Re: [PATCH 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-14 Thread Rasmus Villemoes
On 13/10/2020 23.09, Al Viro wrote:
> On Tue, Oct 13, 2020 at 04:06:08PM +0200, Giuseppe Scrivano wrote:
>> +spin_lock(&cur_fds->file_lock);
>> +fdt = files_fdtable(cur_fds);
>> +cur_max = fdt->max_fds - 1;
>> +max_fd = min(max_fd, cur_max);
>> +while (fd <= max_fd)
>> +__set_close_on_exec(fd++, fdt);
>> +spin_unlock(&cur_fds->file_lock);
> 
>   First of all, this is an atrocious way to set all bits
> in a range.  What's more, you don't want to set it for *all*
> bits - only for the ones present in open bitmap.  It's probably
> harmless at the moment, but let's not create interesting surprises
> for the future.

Eh, why not? They can already be set for unallocated slots:

commit 5297908270549b734c7c2556745e2385b6d4941d
Author: Mateusz Guzik 
Date:   Tue Oct 3 12:58:14 2017 +0200

vfs: stop clearing close on exec when closing a fd

Codepaths allocating a fd always make sure the bit is set/unset
depending on flags, thus clearing on close is redundant.

And while we're on that subject, yours truly suggested exactly that two
years prior [1], with a follow-up [2] in 2018 to do what wasn't done in
5297908, but (still) seems like obvious micro-optimizations, given that
the close_on_exec bitmap is not maintained as a subset of the open
bitmap. Mind taking a look at [2]?

[1]
https://lore.kernel.org/lkml/1446543679-28849-1-git-send-email-li...@rasmusvillemoes.dk/t/#u
[2]
https://lore.kernel.org/lkml/20181024160159.25884-1-li...@rasmusvillemoes.dk/

Rasmus


using one gpio for multiple dht22 sensors

2020-10-14 Thread Rasmus Villemoes
Hi Peter

Since you're the author of io-channel-mux.txt, gpio-mux.txt and
mux-controller.txt, I hope you don't mind me asking some perhaps silly
questions.

I'm going to hook up a bunch of dht22 humidity (and temperature) sensors
[1] (drivers/iio/humidity/dht11.c), but partly due to limited number of
available gpios, I'm also going to use a 74hc4051 multiplexer [2], so
that all the dht22's actually sit on the same gpio.

It's pretty obvious how the multiplexer is to be described in
device-tree (I'm probably going to send a patch to add support for an
optional "enable-gpio", as on the 74hc4051, so that MUX_IDLE_DISCONNECT
gets supported).

It also seems like io-channel-mux should somehow magically apply to all
kinds of iio devices, but it can't be that simple. And if it is, I can't
figure out how to write the DT. So:

- do I need to teach the dht11.c driver to be mux-aware?
- currently, a dht11/dht22 shows up in sysfs with just two files,
in_humidityrelative_input and in_temp_input. Now, should I end up with
one device and, say, 2*8 files (so that it seems like one sensor with
eight input channels), or should it show up as eight different devices?
If the latter, I guess the device tree would end up with the same "gpios
= " spec for all eight - is that even allowed?

If you can show how the DT is supposed to describe this situation, I'll
try to fill out the missing pieces on the driver side.

[I could also just not describe the multiplexer at all and control
everything about it by toggling gpios from userspace, and just have a
single dht22 in DT, but I prefer doing this "the right way" if it's
feasible.]

Thanks,
Rasmus

Just FTR:

[1]
https://www.electroschematics.com/wp-content/uploads/2015/02/DHT22-datasheet.pdf
[2] https://assets.nexperia.com/documents/data-sheet/74HC_HCT4051.pdf


Re: [PATCH 1/2] fs, close_range: add flag CLOSE_RANGE_CLOEXEC

2020-10-13 Thread Rasmus Villemoes
On 13/10/2020 22.54, Christian Brauner wrote:
> On Tue, Oct 13, 2020 at 04:06:08PM +0200, Giuseppe Scrivano wrote:
> 
> Hey Guiseppe,
> 
> Thanks for the patch!
> 
>> When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't
>> immediately close the files but it sets the close-on-exec bit.
> 
> Hm, please expand on the use-cases a little here so people know where
> and how this is useful. Keeping the rationale for a change in the commit
> log is really important.
> 

> I think I don't have quarrels with this patch in principle but I wonder
> if something like the following wouldn't be easier to follow:
> 
> diff --git a/fs/file.c b/fs/file.c
> index 21c0893f2f1d..872a4098c3be 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -672,6 +672,32 @@ int __close_fd(struct files_struct *files, unsigned fd)
>  }
>  EXPORT_SYMBOL(__close_fd); /* for ksys_close() */
>  
> +static inline void __range_cloexec(struct files_struct *cur_fds,
> +unsigned int fd, unsigned max_fd)
> +{
> + struct fdtable *fdt;
> + spin_lock(&cur_fds->file_lock);
> + fdt = files_fdtable(cur_fds);
> + while (fd <= max_fd)
> + __set_close_on_exec(fd++, fdt);

Doesn't that want to be

  bitmap_set(fdt->close_on_exec, fd, max_fd - fd + 1)

to do word-at-a-time? I assume this would mostly be called with (3, ~0U)
as arguments or something like that.

Rasmus


Re: [GIT PULL] io_uring updates for 5.10-rc1

2020-10-13 Thread Rasmus Villemoes
On 13/10/2020 21.49, Jens Axboe wrote:
> On 10/13/20 1:46 PM, Linus Torvalds wrote:
>> On Mon, Oct 12, 2020 at 6:46 AM Jens Axboe  wrote:
>>>
>>> Here are the io_uring updates for 5.10.
>>
>> Very strange. My clang build gives a warning I've never seen before:
>>
>>/tmp/io_uring-dd40c4.s:26476: Warning: ignoring changed section
>> attributes for .data..read_mostly
>>
>> and looking at what clang generates for the *.s file, it seems to be
>> the "section" line in:
>>
>> .type   io_op_defs,@object  # @io_op_defs
>> .section.data..read_mostly,"a",@progbits
>> .p2align4
>>
>> I think it's the combination of "const" and "__read_mostly".
>>
>> I think the warning is sensible: how can a piece of data be both
>> "const" and "__read_mostly"? If it's "const", then it's not "mostly"
>> read - it had better be _always_ read.
>>
>> I'm letting it go, and I've pulled this (gcc doesn't complain), but
>> please have a look.
> 
> Huh weird, I'll take a look. FWIW, the construct isn't unique across
> the kernel.

Citation needed. There's lots of "pointer to const foo" stuff declared
as __read_mostly, but I can't find any objects that are themselves both
const and __read_mostly. Other than that io_op_defs and io_uring_fops now.

But... there's something a little weird:

$ grep read_most -- fs/io_uring.s
.section.data..read_mostly,"a",@progbits
$ readelf --wide -S fs/io_uring.o | grep read_most
  [32] .data..read_mostly PROGBITS 01b4e0 000188
00  WA  0   0 32

(this is with gcc/gas). So despite that .section directive not saying
"aw", the section got the W flag anyway. There are lots of

.section "__tracepoints_ptrs", "a"
  .pushsection .smp_locks,"a"

in the .s file, and those sections do end up with just the A bit in the
.o file. Does gas maybe somehow special-case a section name starting
with .data?

Rasmus


Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

2020-10-13 Thread Rasmus Villemoes
On 10/07/2020 17.57, Matthew Wilcox wrote:
> On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
>> The ability to check open file descriptions for equality (without
>> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
>> useful outside of the checkpoint/restore use case - for example,
>> systemd uses kcmp() to deduplicate the per-service file descriptor
>> store.
>>
>> Make it possible to have the kcmp() syscall without the full
>> CONFIG_CHECKPOINT_RESTORE.
> 
> If systemd is using it, is it even worth making it conditional any more?
> Maybe for CONFIG_EXPERT builds, it could be de-selectable.
> 

[hm, I dropped the ball, sorry for the necromancy]

Well, first, I don't want to change any defaults here, if that is to be
done, it should be a separate patch.

Second, yes, systemd uses it for the de-duplication, and for that reason
recommends CONFIG_CHECKPOINT_RESTORE (at least, according to their
README) - but I'm not aware of any daemons that actually make use of
systemd's file descriptor store, so it's not really something essential
to every systemd-based system out there. It would be nice if systemd
could change its recommendation to just CONFIG_KCMP_SYSCALL.

But it's also useful for others, e.g. I have some code that wants to
temporarily replace stdin/stdout/stderr with some other file
descriptors, but needs to preserve the '0 is a dup of 1, or not' state
(i.e., is the same struct file) - that cannot reliably be determined
from fstat()/lseek(SEEK_CUR)/F_GETFL or whatever else one could throw at
an fd.

Rasmus


Re: [PATCH] lib: Convert test_printf.c to KUnit

2020-10-13 Thread Rasmus Villemoes
On 12/10/2020 22.46, Brendan Higgins wrote:
> On Fri, Aug 21, 2020 at 03:28:49PM +0300, Andy Shevchenko wrote:
>> On Fri, Aug 21, 2020 at 01:37:10PM +0200, Petr Mladek wrote:
>>> On Mon 2020-08-17 09:06:32, Rasmus Villemoes wrote:
>>>> On 17/08/2020 06.30, Arpitha Raghunandan wrote:
>>>>> Converts test lib/test_printf.c to KUnit.
>>>>> More information about KUnit can be found at
>>>>> https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html.
>>>>> KUnit provides a common framework for unit tests in the kernel.
>>>>
>>>> So I can continue to build a kernel with some appropriate CONFIG set to
>>>> y, boot it under virt-me, run dmesg and see if I broke printf? That's
>>>> what I do now, and I don't want to have to start using some enterprisy
>>>> framework.
>>>
>>> I had the same concern. I have tried it.
> 
> Sorry you feel that way. Do you have any suggestions on how we can make
> it seem less enterprisy? Seems like there are people here who are not a
> fan of the output format, so of which we can fix here, some of which is
> part of KTAP[1].

I'm fine with machine-readable TAP, but I most defintely also want
human-readable, which means all the excessive and pointless lines need
to go away.

>> Which raises an obvious question: did the people who convert this test this
>> themselves? Looks like a janitor work in the area without understanding the
>> area good enough.
> 
> Looks to me like Arpitha ran it, but you are right, we don't have a lot
> of familiarity with this area; we were treating it as "janitor work" as
> you say.
> 
> Our intention was just to take some existing tests and as non-invasively
> as possible, get them to report using a common format, and maybe even
> get some of the tests to follow a common pattern.
> 
>> Probably I will NAK all those patches from now on, until it will be good 
>> commit
>> messages and cover of risen aspects, including reference to before and after
>> outcome for passed and failed test cases.
> 
> Fair enough, hopefully we can address these issues in the next revision.
> 
> One issue though, with the "before and after outcome" you are
> referencing; are you referring to the issue that Petr pointed out in how
> they are inconsistent:
> 
>    + original code: vsnprintf(buf, 6, "%pi4|%pI4", ...) wrote '127.0', 
> expected '127-0'
>    + kunit code: vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote 
> '127.000.000.001|127', expected '127-000.000.001|127'  
> 
> (I think Rasmus addressed this.) Or are your referring to something
> else?

Yeah, that change is fine and expected, can we stop bringing that up.

It's all the explicit "memcmp() == 0 failed" gunk at least I am
concerned with. If you can get rid of that (basically, stop stringifying
the code, that's completely irrelevant) and just get the messages from
the test itself that explains what went wrong. I'm fine with
interspersing that with a few TAP-readable lines. But things like

 Expected memcmp(test_buffer, expect, written) == 0, but
   memcmp(test_buffer, expect, written) == 1
   0 == 0

are utterly useless. We're not _testing_ memcmp, we're _using_ it to
know if vsprintf() did as we expected. So just mechanically changing
"memcmp() == 0" into "SOME_MACRO(memcmp(), 0)" is never going to work,
at least when SOME_MACRO does the stringify and ends up producing the
above. But if you can end up producing

[   56.795433] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:76
   vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote
'127.000.000.001|127', expected '127-000.000.001|127'

that's fine; that's basically just prepending an EXPECTATION FAILED line
to the existing output.

So doing it properly would probably be either

- change the existing pr_warn()s to use some KUNIT macro that generates
whatever extra info is needed by TAP, in addition to the current
human-readable message, and/or
- just add a few lines of TAP-suitable FAIL/PASS lines here and there

but let me repeat that the control flow (early returns) in do_test()
cannot be modified.

Rasmus


Re: [PATCH] bcache: Use #ifdef instead of boolean variable

2020-10-09 Thread Rasmus Villemoes
On 09/10/2020 20.34, Alex Dewar wrote:
> The variable async_registration is used to indicate whether or not
> CONFIG_BCACHE_ASYNC_REGISTRATION is enabled, triggering a (false)
> warning in Coverity about unreachable code. However, it seems clearer in
> this case just to use an #ifdef, so let's do that instead.
> 
> Addresses-Coverity-ID: 1497746: Control flow issues (DEADCODE)

I think that coverity check needs to be ignored. The kernel is full of
things that are supposed to be eliminated by the compiler, but still
checked for valid syntax etc. Often it's even more hidden than this,
something like

// some header
#ifdef CONFIG_FOO
int foo(void);
#else
static inline int foo(void) { return 0; }
#endif

// code

  if (foo()) { ... // this goes away for CONFIG_FOO=n }

> Signed-off-by: Alex Dewar 
> ---
>  drivers/md/bcache/super.c | 40 +--
>  1 file changed, 17 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 46a00134a36a..6d4127881c6a 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2504,11 +2504,6 @@ static ssize_t register_bcache(struct kobject *k, 
> struct kobj_attribute *attr,
>   struct cache_sb_disk *sb_disk;
>   struct block_device *bdev;
>   ssize_t ret;
> - bool async_registration = false;
> -
> -#ifdef CONFIG_BCACHE_ASYNC_REGISTRATION
> - async_registration = true;
> -#endif

If anything, this should simply be changed to

  bool async_registration = IS_ENABLED(CONFIG_BCACHE_ASYNC_REGISTRATION);

Rasmus


Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers

2020-10-01 Thread Rasmus Villemoes
On 30/09/2020 15.35, Steven Rostedt wrote:
> On Wed, 30 Sep 2020 10:06:24 +0200
> Rasmus Villemoes  wrote:
> 
>> True. But remember that printk is called from _everywhere_, with all
>> sorts of locks held and/or preemption disabled or whatnot, and every
>> cycle spent in printk makes those windows wider. Doubling the cost of
>> every single printk by unconditionally doing vsnprintf() twice is a bad
>> idea.
> 
> But the console output is usually magnitudes more expensive than the
> vsnprintf(), would doing it twice really make a difference?

AFAIU, not every message gets printed to the console directly - syslog(2):

   /proc/sys/kernel/printk
   /proc/sys/kernel/printk is a writable file containing four
integer val‐
   ues that influence kernel printk() behavior when  printing  or
logging
   error messages.  The four values are:

   console_loglevel
  Only  messages  with  a  log level lower than this value
will be
  printed to the console.  The default value  for  this
field  is
  DEFAULT_CONSOLE_LOGLEVEL  (7),  but it is set to 4 if the
kernel
  command line contains the word "quiet",

So the normal state of things is that you don't pay the cost of printing
to the console for all the pr_debug (ok, they may be compiled out or
run-time disabled depending on DYNAMIC_DEBUG etc.), nor info, notice,
warn. For those messages that are not directly written to the console,
the vsnprintf() is a large part of the cost (not exactly half, of
course, so doubling is an exaggeration, but whether it's 70% or 100%
doesn't really matter).

I'm not at all concerned about pr_err and above becoming more expensive,
they are rare. But random drivers are filled with random pr_info in
random contexts - just a small selection from dmesg -x shows these
really important things:

kern  :info  : [ 4631.338105] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 1
kern  :info  : [ 4642.218100] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 0
kern  :info  : [ 4643.882038] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 1
kern  :info  : [ 4667.562011] ax88179_178a 3-13.2.3.3:1.0 eth0: ax88179
- Link status is: 0
...
kern  :info  : [ 9149.215456] [drm] ring test on 1 succeeded in 1 usecs
kern  :info  : [ 9149.215459] [drm] ring test on 2 succeeded in 1 usecs
kern  :info  : [ 9149.215466] [drm] ring test on 3 succeeded in 4 usecs

and if I'm reading the code correctly, the former is even an example of
something that happens in irq context.

Rasmus


Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers

2020-09-30 Thread Rasmus Villemoes
On 25/09/2020 10.28, Petr Mladek wrote:
> On Thu 2020-09-24 14:32:49, Rasmus Villemoes wrote:
>> On 24/09/2020 11.54, Rasmus Villemoes wrote:
>>> On 23/09/2020 17.11, Petr Mladek wrote:
>>>> On Tue 2020-09-22 17:44:14, John Ogness wrote:
>>>>> vprintk_store() is using a single static buffer as a temporary
>>>>> sprint buffer for the message text. This will not work once
>>>>> @logbuf_lock is removed. Replace the single static buffer with a
>>>>> pool of buffers.
>>>>
>>>> The buffer is used because we do not know the length of the
>>>> formatted message to reserve the right space in the ring buffer
>>>> in advance.
>>>>
>>>> There was the idea to call vsprintf(NULL, fmt, args) to count
>>>> the length in advance.
>>>
>>> sprintf is dog slow.
> 
> Well, printk() is a slow path. It has never been too optimized for
> speed. The main purpose is to report problems and eventually some
> interesting information.

True. But remember that printk is called from _everywhere_, with all
sorts of locks held and/or preemption disabled or whatnot, and every
cycle spent in printk makes those windows wider. Doubling the cost of
every single printk by unconditionally doing vsnprintf() twice is a bad
idea.

Rasmus


Re: [PATCH v1 0/6] seccomp: Implement constant action bitmaps

2020-09-24 Thread Rasmus Villemoes
On 24/09/2020 15.58, YiFei Zhu wrote:
> On Thu, Sep 24, 2020 at 8:46 AM Rasmus Villemoes
>  wrote:
>> But one thing I'm wondering about and I haven't seen addressed anywhere:
>> Why build the bitmap on the kernel side (with all the complexity of
>> having to emulate the filter for all syscalls)? Why can't userspace just
>> hand the kernel "here's a new filter: the syscalls in this bitmap are
>> always allowed noquestionsasked, for the rest, run this bpf". Sure, that
>> might require a new syscall or extending seccomp(2) somewhat, but isn't
>> that a _lot_ simpler? It would probably also mean that the bpf we do get
>> handed is a lot smaller. Userspace might need to pass a couple of
>> bitmaps, one for each relevant arch, but you get the overall idea.
> 
> Perhaps. The thing is, the current API expects any filter attaches to
> be "additive". If a new filter gets attached that says "disallow read"
> then no matter whatever has been attached already, "read" shall not be
> allowed at the next syscall, bypassing all previous allowlist bitmaps
> (so you need to emulate the bpf anyways here?). We should also not
> have a API that could let anyone escape the secomp jail. Say "prctl"
> is permitted but "read" is not permitted, one must not be allowed to
> attach a bitmap so that "read" now appears in the allowlist. The only
> way this could potentially work is to attach a BPF filter and a bitmap
> at the same time in the same syscall, which might mean API redesign?

Yes, the man page would read something like

   SECCOMP_SET_MODE_FILTER_BITMAP
  The system calls allowed are defined by a pointer to a
Berkeley Packet Filter (BPF) passed  via  args.
  This argument is a pointer to a struct sock_fprog_bitmap;

with that struct containing whatever information/extra pointers needed
for passing the bitmap(s) in addition to the bpf prog.

And SECCOMP_SET_MODE_FILTER would internally just be updated to work
as-if all-zero allow-bitmaps were passed along. The internal kernel
bitmap would just be the and of the bitmaps in the filter stack.

Sure, it's UAPI, so would certainly need more careful thought on details
of just how the arg struct looks like etc. etc., but I was wondering why
it hadn't been discussed at all.

>> I'm also a bit worried about the performance of doing that emulation;
>> that's constant extra overhead for, say, launching a docker container.
> 
> IMO, launching a docker container is so expensive this should be negligible.

Regardless, I'd like to see some numbers, certainly for the "how much
faster does a getpid() or read() or any of the other syscalls that
nobody disallows" get, but also "what's the cost of doing that emulation
at seccomp(2) time".

Rasmus


Re: [PATCH v1 0/6] seccomp: Implement constant action bitmaps

2020-09-24 Thread Rasmus Villemoes
On 24/09/2020 01.29, Kees Cook wrote:
> rfc: 
> https://lore.kernel.org/lkml/20200616074934.1600036-1-keesc...@chromium.org/
> alternative: 
> https://lore.kernel.org/containers/cover.1600661418.git.yifei...@illinois.edu/
> v1:
> - rebase to for-next/seccomp
> - finish X86_X32 support for both pinning and bitmaps
> - replace TLB magic with Jann's emulator
> - add JSET insn
> 
> TODO:
> - add ALU|AND insn
> - significantly more testing
> 
> Hi,
> 
> This is a refresh of my earlier constant action bitmap series. It looks
> like the RFC was missed on the container list, so I've CCed it now. :)
> I'd like to work from this series, as it handles the multi-architecture
> stuff.

So, I agree with Jann's point that the only thing that matters is that
always-allowed syscalls are indeed allowed fast.

But one thing I'm wondering about and I haven't seen addressed anywhere:
Why build the bitmap on the kernel side (with all the complexity of
having to emulate the filter for all syscalls)? Why can't userspace just
hand the kernel "here's a new filter: the syscalls in this bitmap are
always allowed noquestionsasked, for the rest, run this bpf". Sure, that
might require a new syscall or extending seccomp(2) somewhat, but isn't
that a _lot_ simpler? It would probably also mean that the bpf we do get
handed is a lot smaller. Userspace might need to pass a couple of
bitmaps, one for each relevant arch, but you get the overall idea.

I'm also a bit worried about the performance of doing that emulation;
that's constant extra overhead for, say, launching a docker container.

Regardless of how the kernel's bitmap gets created, something like

+   if (nr < NR_syscalls) {
+   if (test_bit(nr, bitmaps->allow)) {
+   *filter_ret = SECCOMP_RET_ALLOW;
+   return true;
+   }

probably wants some nospec protection somewhere to avoid the irony of
seccomp() being used actively by bad guys.

Rasmus


Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers

2020-09-24 Thread Rasmus Villemoes
On 24/09/2020 11.54, Rasmus Villemoes wrote:
> On 23/09/2020 17.11, Petr Mladek wrote:
>> On Tue 2020-09-22 17:44:14, John Ogness wrote:
>>> vprintk_store() is using a single static buffer as a temporary
>>> sprint buffer for the message text. This will not work once
>>> @logbuf_lock is removed. Replace the single static buffer with a
>>> pool of buffers.
>>
>> The buffer is used because we do not know the length of the
>> formatted message to reserve the right space in the ring buffer
>> in advance.
>>
>> There was the idea to call vsprintf(NULL, fmt, args) to count
>> the length in advance.
> 
> sprintf is dog slow. If you do this, perhaps say "we can afford to use
> 128 bytes of stack" and do vsprintf(stackbuf, 128, fmt, args) to do the
> counting, and in the vast majority of cases where the text fits we don't
> need to do vsprintf() again.

Or, since 128 bytes of stack may be too much, combine the "pre-allocate
a few buffers" with "but fall back to vsprintf(NULL) if we can't get
one". That should allow choosing the X in X*1024 smaller than what
worst-case requires (which will never actually be used, assuming the
machine is doing something useful rather than just printk'ing all day
long) and still works even when a tmp buffer can't be obtained.

Rasmus


Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers

2020-09-24 Thread Rasmus Villemoes
On 23/09/2020 17.11, Petr Mladek wrote:
> On Tue 2020-09-22 17:44:14, John Ogness wrote:
>> vprintk_store() is using a single static buffer as a temporary
>> sprint buffer for the message text. This will not work once
>> @logbuf_lock is removed. Replace the single static buffer with a
>> pool of buffers.
> 
> The buffer is used because we do not know the length of the
> formatted message to reserve the right space in the ring buffer
> in advance.
> 
> There was the idea to call vsprintf(NULL, fmt, args) to count
> the length in advance.

sprintf is dog slow. If you do this, perhaps say "we can afford to use
128 bytes of stack" and do vsprintf(stackbuf, 128, fmt, args) to do the
counting, and in the vast majority of cases where the text fits we don't
need to do vsprintf() again.

Rasmus


Re: [PATCH printk 3/5] printk: use buffer pool for sprint buffers

2020-09-24 Thread Rasmus Villemoes
On 24/09/2020 10.53, Sergey Senozhatsky wrote:
> On (20/09/24 10:45), Petr Mladek wrote:
>> On Thu 2020-09-24 14:40:58, Sergey Senozhatsky wrote:
>>> On (20/09/23 17:11), Petr Mladek wrote:

 AFAIK, there is one catch. We need to use va_copy() around
 the 1st call because va_format can be proceed only once.

>>>
>>> Current printk() should be good enough for reporting, say, "Kernel
>>> stack overflow" errors. Is extra pressure that va_copy() adds something
>>> that we need to consider?
>>
>> The thing is that vsprintf() traverses the arguments using va_arg().
>> It modifies internal values so that the next va_arg() will read
>> the next value.
> 
> Yes, I understand the purpose of va_copy(). I'm asking if we are
> always on the safe side doing va_copy for every printk (+ potential
> recursive va_copy-s).

va_copy doesn't really add any extra stack use worth talking about. When
ABI says all arguments are passed on stack, va_list is just a pointer
into the arguments, and va_copy merely copies that pointer. When some
arguments are passed in register, the function prologue sets up a
"register save area" where those registers are stashed, and va_list then
contains two pointers: one to the reg save area, one to the place in the
stack where the rest of the arguments are, plus a bit of control
information on how many of the register arguments have been consumed so
far (and that control info is the only reason one must "reset" a
va_list, or rather use a copy that was made before consumption started).
In either case, an extra va_list doesn't cost more than 24 bytes or so
of stack - even if printk() was called with 100 arguments, those
90-100ish arguments are only stored once on the stack.

Rasmus


Re: WARNING in ex_handler_uaccess

2020-09-21 Thread Rasmus Villemoes
On 19/09/2020 02.17, Al Viro wrote:
> On Fri, Sep 18, 2020 at 05:07:43PM -0700, Andy Lutomirski wrote:
>> On Fri, Sep 18, 2020 at 4:55 PM Al Viro  wrote:
>>>
>>> On Fri, Sep 18, 2020 at 04:31:33PM -0700, Andy Lutomirski wrote:
>>>
 check_zeroed_user() looks buggy.  It does:

if (!user_access_begin(from, size))
return -EFAULT;

unsafe_get_user(val, (unsigned long __user *) from, err_fault);

 This is wrong if size < sizeof(unsigned long) -- you read outside the
 area you verified using user_access_begin().
>>>
>>> Read the code immediately prior to that.  from will be word-aligned,
>>> and size will be extended accordingly.  If the area acceptable for
>>> user_access_begin() ends *NOT* on a word boundary, you have a problem
>>> and I would strongly recommend to seek professional help.
>>>
>>> All reads in that thing are word-aligned and word-sized.  So I very
>>> much doubt that your analysis is correct.
>>
>> Maybe -ETOOTIRED, but I seriously question the math in here.  Suppose
>> from == (unsigned long *)1 and size == 1.  Then align is 1, and we do:
>>
>> from -= align;
>> size += align;
>>
>> So now from = 0 and size = 2.  Now we do user_access_begin(0, 2) and
>> then immediately read 4 or 8 bytes.  No good.
> 
> Could you explain what kind of insane hardware manages to do #PF-related
> checks (including SMAP, whatever) with *sub*WORD* granularity?
> 
> If it's OK with 16bit read from word-aligned address, but barfs on 64bit
> one...  I want to know what the hell had its authors been smoking.
> 

So, not sure how the above got triggered, but I notice there might be an
edge case in check_zeroed_user():

from -= align;
size += align;

if (!user_read_access_begin(from, size))
return -EFAULT;

unsafe_get_user(val, (unsigned long __user *) from, err_fault);


Suppose size is (size_t)-3 and align is 3. What's the convention for
access_ok(whatever, 0)? Is that equivalent to access_ok(whatever, 1), or
is it always true (or $ARCH-dependent)?

But, AFAICT, no current caller of check_zeroed_user can end up passing
in a size that can overflow to 0. E.g. for the case at hand, size cannot
be more than SIZE_MAX-24.

Rasmus


Re: [PATCH printk 2/3] printk: move dictionary keys to dev_printk_info

2020-09-18 Thread Rasmus Villemoes
On 18/09/2020 14.13, Petr Mladek wrote:
> On Fri 2020-09-18 08:16:37, Rasmus Villemoes wrote:
>> On 17/09/2020 15.16, John Ogness wrote:
>>
>>> if (dev->class)
>>> subsys = dev->class->name;
>>> else if (dev->bus)
>>> subsys = dev->bus->name;
>>> else
>>> -   return 0;
>>> +   return;
>>>  
>>> -   pos += snprintf(hdr + pos, hdrlen - pos, "SUBSYSTEM=%s", subsys);
>>> -   if (pos >= hdrlen)
>>> -   goto overflow;
>>> +   snprintf(dev_info->subsystem, sizeof(dev_info->subsystem), subsys);
>>
>> It's unlikely that subsys would contain a %, but this will be yet
>> another place to spend brain cycles ignoring if doing static analysis.
>> So can we not do this. Either of strXcpy() for X=s,l will do the same
>> thing, and likely faster.
> 
> Good point! Better be on the safe size in a generic printk() API.
> 
> Well, I am afraid that this would be only small drop in a huge lake.
> class->name and bus->name seems to be passed to %s in so many
> *print*() calls all over the kernel code.

So what? printf("%s", some_string_that_might_contain_percent_chars) is
not a problem.

printf(some_string_that_might_contain_percent_chars) is.

And yes, one could do

  snprintf(dev_info->subsystem, sizeof(dev_info->subsystem), "%s", subsys);

but one might as well avoid the snprintf overhead and use one of the
strX functions that have the exact same semantics (copy as much as
there's room for, ensure nul-termination).

Rasmus


Re: [PATCH printk 2/3] printk: move dictionary keys to dev_printk_info

2020-09-17 Thread Rasmus Villemoes
On 17/09/2020 15.16, John Ogness wrote:

>   if (dev->class)
>   subsys = dev->class->name;
>   else if (dev->bus)
>   subsys = dev->bus->name;
>   else
> - return 0;
> + return;
>  
> - pos += snprintf(hdr + pos, hdrlen - pos, "SUBSYSTEM=%s", subsys);
> - if (pos >= hdrlen)
> - goto overflow;
> + snprintf(dev_info->subsystem, sizeof(dev_info->subsystem), subsys);

It's unlikely that subsys would contain a %, but this will be yet
another place to spend brain cycles ignoring if doing static analysis.
So can we not do this. Either of strXcpy() for X=s,l will do the same
thing, and likely faster.

Rasmus


Re: [PATCH v2] scripts/setlocalversion: make git describe output more reliable

2020-09-17 Thread Rasmus Villemoes
On 17/09/2020 14.22, Nico Schottelius wrote:
> 
> Thanks for the patch Rasmus. Overall it looks good to me, be aligned to
> the stable patch submission rules makes sense. A tiny thing though:
> 
> I did not calculate the exact collision probability with 12 characters

For reference, the math is something like this: Consider a repo with N+1
objects. We look at one specific object (for setlocalversion its the
head commit being built, for the stable rules its whatever particular
commit one is interested in for backporting), and want to know the
probability that its sha1 collides with some other object in the first b
bits (here b=48). Assuming the sha1s are independent and uniformly
distributed, the probability of not colliding with one specific other
commit is x=1-1/2^b, and the probability of not colliding with any of
the other N commits is x^N, making the probability of a collision 1-x^N
= (1-x)(1+x+x^2+...+x^{N-1}). Now the N terms in the second factor are
very-close-to-but-slightly-smaller-than 1, so an upper bound for this
probability is (1-x)N = N/2^b, which is also what one would naively
expect. [This estimate is always valid, but it becomes a void statement
of "the probability is less then 1" when N is >= 2^b].

So, assuming some vendor kernel repo that has all of Greg's stable.git
(around 10M objects I think) and another 10M objects because random
vendor, that works out to 20e6/2^48 = 7.1e-8, 71 ppb.

> So I suggest you introduce something on the line of:
> 
> ...
> num_chars=12
> ...
> --abbrev=$num_chars

I considered that, but it becomes quite ugly since it needs to get into
the awk script (as a 13, though perhaps we could get awk to do the +1, I
don't really speak awk), where we'd then need to use " instead of ' and
then escape the $ that are to be interpreted by awk and not the shell.
So I think it's more readable with hardcoding and comments explaining
why they are there; should anyone ever want to change 12.

Rasmus


[PATCH v2] scripts/setlocalversion: make git describe output more reliable

2020-09-16 Thread Rasmus Villemoes
When building for an embedded target using Yocto, we're sometimes
observing that the version string that gets built into vmlinux (and
thus what uname -a reports) differs from the path under /lib/modules/
where modules get installed in the rootfs, but only in the length of
the -gabc123def suffix. Hence modprobe always fails.

The problem is that Yocto has the concept of "sstate" (shared state),
which allows different developers/buildbots/etc. to share build
artifacts, based on a hash of all the metadata that went into building
that artifact - and that metadata includes all dependencies (e.g. the
compiler used etc.). That normally works quite well; usually a clean
build (without using any sstate cache) done by one developer ends up
being binary identical to a build done on another host. However, one
thing that can cause two developers to end up with different builds
[and thus make one's vmlinux package incompatible with the other's
kernel-dev package], which is not captured by the metadata hashing, is
this `git describe`: The output of that can be affected by

(1) git version: before 2.11 git defaulted to a minimum of 7, since
2.11 (git.git commit e6c587) the default is dynamic based on the
number of objects in the repo
(2) hence even if both run the same git version, the output can differ
based on how many remotes are being tracked (or just lots of local
development branches or plain old garbage)
(3) and of course somebody could have a core.abbrev config setting in
~/.gitconfig

So in order to avoid `uname -a` output relying on such random details
of the build environment which are rather hard to ensure are
consistent between developers and buildbots, make sure the abbreviated
sha1 always consists of exactly 12 hex characters. That is consistent
with the current rule for -stable patches, and is almost always enough
to identify the head commit unambigously - in the few cases where it
does not, the v5.4.3-00021- prefix would certainly nail it down.

Signed-off-by: Rasmus Villemoes 
---
v2: use 12 instead of 15, and ensure that the result does have exactly
12 hex chars.

 scripts/setlocalversion | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index 20f2efd57b11..bb709eda96cd 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -45,7 +45,7 @@ scm_version()
 
# Check for git and a git repo.
if test -z "$(git rev-parse --show-cdup 2>/dev/null)" &&
-  head=$(git rev-parse --verify --short HEAD 2>/dev/null); then
+  head=$(git rev-parse --verify HEAD 2>/dev/null); then
 
# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore
# it, because this version is defined in the top level Makefile.
@@ -59,11 +59,22 @@ scm_version()
fi
# If we are past a tagged commit (like
# "v2.6.30-rc5-302-g72357d5"), we pretty print it.
-   if atag="$(git describe 2>/dev/null)"; then
-   echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),$(NF))}'
-
-   # If we don't have a tag at all we print -g{commitish}.
+   #
+   # Ensure the abbreviated sha1 has exactly 12
+   # hex characters, to make the output
+   # independent of git version, local
+   # core.abbrev settings and/or total number of
+   # objects in the current repository - passing
+   # --abbrev=12 ensures a minimum of 12, and the
+   # awk substr() then picks the 'g' and first 12
+   # hex chars.
+   if atag="$(git describe --abbrev=12 2>/dev/null)"; then
+   echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),substr($(NF),0,13))}'
+
+   # If we don't have a tag at all we print -g{commitish},
+   # again using exactly 12 hex chars.
else
+   head="$(echo $head | cut -c1-12)"
printf '%s%s' -g $head
fi
fi
-- 
2.23.0



Re: [PATCH] scripts/setlocalversion: make git describe output more reliable

2020-09-16 Thread Rasmus Villemoes
On 16/09/2020 16.28, Masahiro Yamada wrote:
> On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes
>  wrote:
>>
>> On 10/09/2020 21.05, Brian Norris wrote:
>>> On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada  
>>> wrote:
>>>> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes
>>>>  wrote:
>>>>> So in order to avoid `uname -a` output relying on such random details
>>>>> of the build environment which are rather hard to ensure are
>>>>> consistent between developers and buildbots, use an explicit
>>>>> --abbrev=15 option (and for consistency, also use rev-parse --short=15
>>>>> for the unlikely case of no signed tags being usable).
>>>
>>> For the patch:
>>>
>>> Reviewed-by: Brian Norris 
>>>
>>>> I agree that any randomness should be avoided.
>>>>
>>>> My question is, do we need 15-digits?
>>> ...
>>>> So, I think the conflict happens
>>>> only when we have two commits that start with the same 7-digits
>>>> in the _same_ release. Is this correct?
>>
>> No.
>>
>>> For git-describe (the case where we have a tag to base off):
>>> "use  digits, or as many digits as needed to form a unique object name"
>>
>> Yes, the abbreviated hash that `git describe` produces is unique among
>> all objects (and objects are more than just commits) in the current
>> repo, so what matters for probability-of-collision is the total number
>> of objects - linus.git itself probably grows ~6 objects per release.
>>
>> As for "do we need 15 digits", well, in theory the next time I merge the
>> next rt-stable tag into our kernel I could end up with a commit that
>> matches some existing object in the first 33 hex chars at which point I
>> should have chosen 34 - but of course that's so unlikely that it's
>> irrelevant.
>>
>> But the upshot of that is that there really is no objective answer to
>> "how many digits do we need", so it becomes a tradeoff between "low
>> enough probability that anyone anywhere in the next few years would have
>> needed more than X when building their own kernel" and readability of
>> `uname -r` etc. So I decided somewhat arbitrarily that each time one
>> rolls a new release, there should be less than 1e-9 probability that
>> HEAD collides with some other object when abbreviated to X hexchars.
>> X=12 doesn't pass that criteria even when the repo has only 10M objects
>> (and, current linus.git already has objects that need 12 to be unique,
>> so such collisions do happen in practice, though of course very rarely).
>> 13 and 14 are just weird numbers, so I ended with 15, which also allows
>> many many more objects in the repo before the probability crosses that
>> 1e-9 threshold.
>>
> 
> 
> This is because you use the output from git as-is.
> 
> Why are you still trying to rely on such obscure behavior of git?
> 
> 
> It is pretty easy to get the fixed number of digits reliably.
> 
> For example,
> git rev-parse --verify HEAD 2>/dev/null | cut -c1-7
> 
> 
> It always returns a 7-digits hash,
> and two different people will get the same result for sure.
> 
> Your solution, --short=15, means "at least 15",
> which still contains ambiguity.
> 
> 
> 
> As I already noted, the kernelrelease string is
> constructed in this format:
> 
> --
> 
> 
> For example, if I enable CONFIG_LOCALVERSION_AUTO=y
> in today's Linus tree, I got this:
> 
> 5.9.0-rc5-5-gfc4f28bb3daf
> 
> 
> What if the number of digits were 7?
> 
> 5.9.0-rc5-5-gfc4f28b
> 
> I see no problem here.

The problem is that currently, the build relies on things which cannot
easily be controlled or reproduced between different developers: Apart
from git version (which is reasonable to mandate is the same, just like
one must use same compiler, binutils etc. to get binary reproducible
output), it depends on whether ~/.gitconfig has a core.abbrev setting -
and even worse, it depends on the _total number of objects in the source
git repo_, i.e. something that has nothing to do with what is currently
in the work tree at all.

And that leads to real bugs when one builds external modules that end up
in one directory in the rootfs, while `uname -r` tells modprobe to look
in some other directory (differing in the length of the abbreviated hash).

> Even if we have another object that starts with "fc4f28b",
> the kernelrelease string is still unique thanks to the
> - prefix.
> 
> Why do we care about the uniq

Re: [PATCH] scripts/setlocalversion: make git describe output more reliable

2020-09-16 Thread Rasmus Villemoes
On 16/09/2020 20.01, Masahiro Yamada wrote:
> On Thu, Sep 17, 2020 at 12:23 AM Rasmus Villemoes
>  wrote:
>>
>> On 16/09/2020 16.28, Masahiro Yamada wrote:
>>> On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes
>>>  wrote:
>>>>

>>> Why do we care about the uniqueness of the abbreviated
>>> hash in the whole git history?
>>
>> Because when I ask a customer "what kernel are you running", and they
>> tell me "4.19.45-rt67-00567-123abc8", I prefer that 123abc8 uniquely
>> identifies the right commit, instead of me having to dig around each of
>> the commits starting with that prefix and see which one of them matches
>> "is exactly 567 commits from 4.19.45-rt67". 7 hexchars is nowhere near
>> enough for that today, which is why Linus himself did that "auto-tune
>> abbrev based on repo size" patch for git.git.
> 
> I like:
> 
>git rev-parse --verify HEAD 2>/dev/null | cut -c1-15
> 
> better than:
> 
>git rev-parse --verify --short=15 HEAD 2>/dev/null
> 
> The former produces the deterministic kernelrelease string.
> 
> 
> But, let's reconsider if we need as long as 15-digits.
> 
> There are a couple of kinds of objects in git: commit, tree, blob.
> 
> I think you came up with 15-digits to ensure the uniqueness
> in _all_ kinds of objects.
> 
> But, when your customer says "4.19.45-rt67-00567-123abc8",
> 123abc8 is apparently a commit instead of a tree or a blob.
> 
> 
> 
> In the context of the kernel version, we can exclude
> tree and blob objects.
> 
> In other words, I think "grows ~6 objects per release"
> is somewhat over-estimating.
> 
> We have ~15000 commits per release. So, the difference
> is just 4x factor, though...
> 
> 
> 
> BTW, I did some experiments, and I noticed
> "git log" accepts a shorter hash
> than "git show" does presumably because
> "git log" only takes a commit.
> 
> 
> 
> For example, "git show 06a0d" fails, but
> "git log 06a0d" works.
> 
> 
> masahiro@oscar:~/ref/linux$ git  show   06a0d
> error: short SHA1 06a0d is ambiguous
> hint: The candidates are:
> hint:   06a0df4d1b8b commit 2020-09-08 - fbcon: remove now unusued
> 'softback_lines' cursor() argument
> hint:   06a0d81911b3 tree
> hint:   06a0dc5a84d2 tree
> hint:   06a0d1947c77 blob
> hint:   06a0df434249 blob
> fatal: ambiguous argument '06a0d': unknown revision or path not in the
> working tree.
> Use '--' to separate paths from revisions, like this:
> 'git  [...] -- [...]'
> masahiro@oscar:~/ref/linux$ git  log --oneline  -1   06a0d
> 06a0df4d1b8b fbcon: remove now unusued 'softback_lines' cursor() argument
> 
> 
> 
> 
> What is interesting is, if you prepend --
> to the abbreviated hash, "git show" accepts as short as a commit
> "git log" does.
> 
> masahiro@oscar:~/ref/linux$ git  show   v5.9-rc5-2-g06a0d  | head -1
> commit 06a0df4d1b8b13b551668e47b11fd7629033b7df
> 
> 
> I guess git cleverly understands it refers to a commit object
> since "v5.9-rc5-2-" prefix only makes sense
> in the commit context.
> 

Well, yes, but unfortunately only so far as applying the same "the user
means a commit object" logic; git doesn't do anything to actually use
the prefix to disambiguate. In fact, looking at a semi-random commit in
4.19-stable v4.19.114-7-g236c445eb352:

$ git show 236c44
error: short SHA1 236c44 is ambiguous
hint: The candidates are:
hint:   236c445eb352 commit 2020-03-13 - drm/bochs: downgrade
pci_request_region failure from error to warning
hint:   236c448cb6e7 commit 2011-09-08 - usbmon vs. tcpdump: fix dropped
packet count
hint:   236c445b1930 blob

Now you're right that we get

$ git show v4.19.114-7-g236c445 | head -n1
commit 236c445eb3529aa7c976f8812513c3cb26d77e27

so it knows we're not looking at that blob. But "git show
v4.19.114-7-g236c44" still fails because there are two commits. Adding
to the fun is that "git show v4.19.114-7-g236c448" actually shows the
usbmon commit, which is old v3.2 stuff and certainly not a descendant of
v4.19.114.

I didn't actually know that git would even accept those prefixed strings
as possible refspecs, and for a moment you had me hoping that git would
actually do the disambiguation using that prefix, which would certainly
make 7 hex chars enough; unfortunately that's not the case.

> Keeping those above in mind, I believe 15-digits is too long.

Well, as you indicated, commits are probably ~1/4 of all objects, i.e.
half a hexchar worth of bits. So the commit/ob

Re: [PATCH] scripts/setlocalversion: make git describe output more reliable

2020-09-16 Thread Rasmus Villemoes
On 10/09/2020 16.34, Masahiro Yamada wrote:
> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes
>  wrote:
>>
...
>>
>> So in order to avoid `uname -a` output relying on such random details
>> of the build environment which are rather hard to ensure are
>> consistent between developers and buildbots, use an explicit
>> --abbrev=15 option (and for consistency, also use rev-parse --short=15
>> for the unlikely case of no signed tags being usable).
>>

Hi Masahiro

Can I get you to carry this through the kbuild tree? Unless of course
your questions/concerns haven't been addressed.

Thanks,
Rasmus


Re: [PATCH] scripts/setlocalversion: make git describe output more reliable

2020-09-11 Thread Rasmus Villemoes
On 10/09/2020 21.05, Brian Norris wrote:
> On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada  wrote:
>> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes
>>  wrote:
>>> So in order to avoid `uname -a` output relying on such random details
>>> of the build environment which are rather hard to ensure are
>>> consistent between developers and buildbots, use an explicit
>>> --abbrev=15 option (and for consistency, also use rev-parse --short=15
>>> for the unlikely case of no signed tags being usable).
> 
> For the patch:
> 
> Reviewed-by: Brian Norris 
> 
>> I agree that any randomness should be avoided.
>>
>> My question is, do we need 15-digits?
> ...
>> So, I think the conflict happens
>> only when we have two commits that start with the same 7-digits
>> in the _same_ release. Is this correct?

No.

> For git-describe (the case where we have a tag to base off):
> "use  digits, or as many digits as needed to form a unique object name"

Yes, the abbreviated hash that `git describe` produces is unique among
all objects (and objects are more than just commits) in the current
repo, so what matters for probability-of-collision is the total number
of objects - linus.git itself probably grows ~6 objects per release.

As for "do we need 15 digits", well, in theory the next time I merge the
next rt-stable tag into our kernel I could end up with a commit that
matches some existing object in the first 33 hex chars at which point I
should have chosen 34 - but of course that's so unlikely that it's
irrelevant.

But the upshot of that is that there really is no objective answer to
"how many digits do we need", so it becomes a tradeoff between "low
enough probability that anyone anywhere in the next few years would have
needed more than X when building their own kernel" and readability of
`uname -r` etc. So I decided somewhat arbitrarily that each time one
rolls a new release, there should be less than 1e-9 probability that
HEAD collides with some other object when abbreviated to X hexchars.
X=12 doesn't pass that criteria even when the repo has only 10M objects
(and, current linus.git already has objects that need 12 to be unique,
so such collisions do happen in practice, though of course very rarely).
13 and 14 are just weird numbers, so I ended with 15, which also allows
many many more objects in the repo before the probability crosses that
1e-9 threshold.

Rasmus

Side note 1: Note that there really isn't any such thing as "last
tag/previous tag" in a DAG; I think what git does is a best-effort
search for the eligible tag that minimizes #({objects reachable from
commit-to-be-described} - {objects reachable from tag}), where eligible
tag means at least reachable from commit-to-be-described (so that set
difference is a proper one), but there can be additional constraints
(e.g. --match=...).

Side note 2: Linus or Greg releases are actually not interesting here
(see the logic in setlocalversion that stops early if we're exactly at
an annotated tag) - the whole raison d'etre for setlocalversion is that
people do maintain and build non-mainline kernels, and it's extremely
useful for `uname -a` to accurately tell just which commit is running on
the target.


[PATCH] scripts/setlocalversion: make git describe output more reliable

2020-09-10 Thread Rasmus Villemoes
When building for an embedded target using Yocto, we're sometimes
observing that the version string that gets built into vmlinux (and
thus what uname -a reports) differs from the path under /lib/modules/
where modules get installed in the rootfs, but only in the length of
the -gabc123def suffix. Hence modprobe always fails.

The problem is that Yocto has the concept of "sstate" (shared state),
which allows different developers/buildbots/etc. to share build
artifacts, based on a hash of all the metadata that went into building
that artifact - and that metadata includes all dependencies (e.g. the
compiler used etc.). That normally works quite well; usually a clean
build (without using any sstate cache) done by one developer ends up
being binary identical to a build done on another host. However, one
thing that can cause two developers to end up with different builds
[and thus make one's vmlinux package incompatible with the other's
kernel-dev package], which is not captured by the metadata hashing, is
this `git describe`: The output of that can be affected by

(1) git version: before 2.11 git defaulted to a minimum of 7, since
2.11 (git.git commit e6c587) the default is dynamic based on the
number of objects in the repo
(2) hence even if both run the same git version, the output can differ
based on how many remotes are being tracked (or just lots of local
development branches or plain old garbage)
(3) and of course somebody could have a core.abbrev config setting in
~/.gitconfig

So in order to avoid `uname -a` output relying on such random details
of the build environment which are rather hard to ensure are
consistent between developers and buildbots, use an explicit
--abbrev=15 option (and for consistency, also use rev-parse --short=15
for the unlikely case of no signed tags being usable).

Now, why is 60 bits enough for everyone? It's not mathematically
guaranteed that git won't have to use 16 in some git repo, but it is
beyond unlikely: Even in a repo with 100M objects, the probability
that any given commit (i.e. the one being described) clashes with some
other object in the first 15 hex chars is less than 1e-10, and
currently a git repo tracking Linus', -stable and -rt only has around
10M objects.

Signed-off-by: Rasmus Villemoes 
---
I could probably fix things by adding a 'git config --local
core.abbrev 15' step to the Yocto build process after the repo to
build from has been cloned but before building has started. But in the
interest of binary reproducibility outside of just Yocto builds, I
think it's better if this lives in the kernel.

 scripts/setlocalversion | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index 20f2efd57b11..c5262f0d953d 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -45,7 +45,7 @@ scm_version()
 
# Check for git and a git repo.
if test -z "$(git rev-parse --show-cdup 2>/dev/null)" &&
-  head=$(git rev-parse --verify --short HEAD 2>/dev/null); then
+  head=$(git rev-parse --verify --short=15 HEAD 2>/dev/null); then
 
# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore
# it, because this version is defined in the top level Makefile.
@@ -59,7 +59,7 @@ scm_version()
fi
# If we are past a tagged commit (like
# "v2.6.30-rc5-302-g72357d5"), we pretty print it.
-   if atag="$(git describe 2>/dev/null)"; then
+   if atag="$(git describe --abbrev=15 2>/dev/null)"; then
echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),$(NF))}'
 
# If we don't have a tag at all we print -g{commitish}.
-- 
2.23.0



Re: [PATCH] /dev/zero: also implement ->read

2020-09-06 Thread Rasmus Villemoes
On 07/09/2020 08.20, Christoph Hellwig wrote:
> On Mon, Sep 07, 2020 at 12:34:37AM +0200, Rasmus Villemoes wrote:
>> On 03/09/2020 17.59, Christoph Hellwig wrote:
> 
>>> +static ssize_t read_zero(struct file *file, char __user *buf,
>>> +size_t count, loff_t *ppos)
>>> +{
>>> +   size_t cleared = 0;
>>> +
>>> +   while (count) {
>>> +   size_t chunk = min_t(size_t, count, PAGE_SIZE);
>>> +
>>> +   if (clear_user(buf + cleared, chunk))
>>> +   return cleared ? cleared : -EFAULT;
>>
>> Probably nobody really cares, but currently doing
>>
>> read(fd, &unmapped_page - 5, 123);
>>
>> returns 5, and those five bytes do get cleared; if I'm reading the above
>> right you'd return -EFAULT for that case.
>>
>>
>>> +   cleared += chunk;
>>> +   count -= chunk;
>>> +
>>> +   if (signal_pending(current))
>>> +   return cleared ? cleared : -ERESTARTSYS;
>>
>> I can't see how we can get here without 'cleared' being positive, so
>> this can just be 'return cleared' (and if you fix the above EFAULT case
>> to more accurately track how much got cleared, there's probably no
>> longer any code to be symmetric with anyway).
> 
> Yeah, I'll fix these up and resend.
> 

Actually, while you're micro-optimizing it, AFAIK VFS already handles
count==0, so you can avoid the initial branch and the last
cond_resched() by writing it something like

  while (1) {
size_t chunk = min_t(size_t, count, PAGE_SIZE), c;
c = chunk - clear_user(buf + cleared, chunk);
if (unlikely(!c))
   return cleared ?: -EFAULT;
cleared += c;
count -= c;
if (!count || signal_pending())
   return cleared;
cond_resched();
  }

For the dd test case with the default bs=512 that avoids cond_resched()
and signal_pending() altogether.

Rasmus


Re: [PATCH] /dev/zero: also implement ->read

2020-09-06 Thread Rasmus Villemoes
On 03/09/2020 17.59, Christoph Hellwig wrote:
> Christophe reported a major speedup due to avoiding the iov_iter
> overhead, so just add this trivial function.  Note that /dev/zero
> already implements both an iter and non-iter writes so this just
> makes it more symmetric.
> 
> Christophe Leroy 

?-by ?

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/char/mem.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index abd4ffdc8cdebc..1dc99ab158457a 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -726,6 +726,27 @@ static ssize_t read_iter_zero(struct kiocb *iocb, struct 
> iov_iter *iter)
>   return written;
>  }
>  
> +static ssize_t read_zero(struct file *file, char __user *buf,
> +  size_t count, loff_t *ppos)
> +{
> + size_t cleared = 0;
> +
> + while (count) {
> + size_t chunk = min_t(size_t, count, PAGE_SIZE);
> +
> + if (clear_user(buf + cleared, chunk))
> + return cleared ? cleared : -EFAULT;

Probably nobody really cares, but currently doing

read(fd, &unmapped_page - 5, 123);

returns 5, and those five bytes do get cleared; if I'm reading the above
right you'd return -EFAULT for that case.


> + cleared += chunk;
> + count -= chunk;
> +
> + if (signal_pending(current))
> + return cleared ? cleared : -ERESTARTSYS;

I can't see how we can get here without 'cleared' being positive, so
this can just be 'return cleared' (and if you fix the above EFAULT case
to more accurately track how much got cleared, there's probably no
longer any code to be symmetric with anyway).

Rasmus


Re: [PATCH rdma-next 2/4] gcov: Use proper duplication routine for const pointer

2020-09-03 Thread Rasmus Villemoes
On 02/09/2020 10.55, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> The filename is a const pointer, so use the proper string duplication
> routine that takes into account const identifier.

This commit log makes no sense at all.

kstrdup_const is merely an optimization that can be used when there's a
good chance that the passed string lives in vmlinux' .rodata, in which
case it is known to be immortal, and we can avoid allocating heap memory
to contain a duplicate. [It also requires that the caller has no
intention of modifying the returned string.]

In the case of something called ->filename, I assume it's initialized
with __FILE__ somewhere, making the above true for built-in stuff but
not for modules. So if the gcov_info can live longer than the module,
it's of course necessary to duplicate the string, but OTOH making an
optimization for the built-in stuff makes sense. So this is certainly
one of the places where kstrdup_const() seems applicable. But it has
nothing whatsoever to do with the C-level qualifiers the argument may have.

Rasmus


Re: [PATCH 2/2] mtd: spi-nor: enable lock interface for macronix chips

2020-08-31 Thread Rasmus Villemoes
On 12/08/2020 17.18, Ivan Mikhaylov wrote:
> Add locks for whole macronix chip series with BP0-2 and BP0-3 bits.
> 
> Tested with mx25l51245g(BP0-3).

Hmm. I've tried adding support for locking on Macronix to U-Boot
(https://patchwork.ozlabs.org/project/uboot/patch/20200326114257.1782-3-rasmus.villem...@prevas.dk/),
but that was quite a bit more involved than this. Note in particular the
first part of my commit message:

  Macronix chips implements locking in (power-of-two multiple of) 64K
  blocks, not as a fraction of the chip's size.

At least, that was true for the chip I was interested in and the few
others whose data sheets I grabbed to double-check. So I'm a bit
skeptical that this can work out-of-the-box without introducing a new
struct spi_nor_locking_ops.

Rasmus


Re: [PATCH] linux/kernel.h: add container_from()

2020-08-27 Thread Rasmus Villemoes
On 27/08/2020 03.36, Allen Pais wrote:
> Introduce container_from() as a generic helper instead of
> sub-systems defining a private from_* API
> (Eg: from_tasklets recently introduced in
> 12cc923f1ccc: Tasklet: Introduce new initialization API)
> 
> The helper is similar to container_of() in argument order
> with the difference of naming the containing structure instead
 ^^^

> of having to specify its type.
> 

> +/**
> + * container_from - cast a member of a structure out to the containing 
> structure
> + * @ptr: the pointer to the member.
> + * @container:   the type of the container struct.
^
This seems to have been copy-pasted from container_of? Shouldn't
@container be the (local) value we're storing into? As in foo =
container_from(..., foo, ...)? Or am I misunderstanding the purpose of this?

[And I think it would read nicer if the bikeshed was called
to_container(), but don't care deeply.]

Rasmus


Re: [PATCH] usb: atm: don't use snprintf() for sysfs attrs

2020-08-27 Thread Rasmus Villemoes
On 27/08/2020 15.18, Alex Dewar wrote:
> On Thu, Aug 27, 2020 at 09:15:37AM +0200, Greg Kroah-Hartman wrote:
>> On Thu, Aug 27, 2020 at 08:42:06AM +0200, Rasmus Villemoes wrote:
>>> On 25/08/2020 00.23, Alex Dewar wrote:
>>>> kernel/cpu.c: don't use snprintf() for sysfs attrs
>>>>
>>>> As per the documentation (Documentation/filesystems/sysfs.rst),
>>>> snprintf() should not be used for formatting values returned by sysfs.
>>>>
>>>
>>> Can we have a sysfs_sprintf() (could just be a macro that does sprintf)
>>> to make it clear to the next reader that we know we're in a sysfs show
>>> method? It would make auditing uses of sprintf() much easier.
>>
>> Code churn to keep code checkers quiet for pointless reasons?  What
>> could go wrong with that...

I did not (mean to) suggest replacing existing sprintf() calls in sysfs
show methods. But when changes _are_ being made, such as when replacing
snprintf() calls for whatever reasons, can we please not make it harder
for people doing manual audits (those are "code checkers" as well, I
suppose, but they do tend to only make noise when finding something).

>> It should be pretty obvious to any reader that you are in a sysfs show
>> method, as almost all of them are trivially tiny and obvious.

git grep doesn't immediately show that, not even with a suitable -C
argument, as you can't really know the potential callers unless you open
the file and see that the function is only assigned as a .show method.
And even that can be a pain because it's all hidden behind five levels
of magic macros that build identifiers with ##.

> Perhaps I should have mentioned this in the commit message, but the problem
> is that snprintf() doesn't return the number of bytes written to the
> destination buffer,

I'm perfectly well aware of that, TYVM (you may want to 'git log
--author Villemoes lib/vsprintf.c').

 but the number of bytes that *would have been written if
> they fitted*, which may be more than the bounds specified [1]. So "return
> snprintf(...)" for sysfs attributes is an antipattern. If you need bounded
> string ops, scnprintf() is the way to go. Using snprintf() can give a
> false sense of security, because it isn't necessarily safe.

Huh? This all seems utterly irrelevant WRT a change that replaces
PAGE_SIZE by INT_MAX (because that's what sprintf() is going to pretend
you passed). You get the same return value.

But I'm not at all concerned about whether one passes the proper buffer
size or not in sysfs show methods; with my embedded hat on, I'm all for
saving a few bytes of .text here and there. The problem, as far as I'm
concerned, is merely that adding sprintf() callers makes it harder to
find the problematic sprintf() instances.


>> Anyway, we've had this for 20 years,

My bad, for a moment I forgot that code and patterns of that vintage
cannot have bugs.

Rasmus


Re: [PATCH] usb: atm: don't use snprintf() for sysfs attrs

2020-08-26 Thread Rasmus Villemoes
On 25/08/2020 00.23, Alex Dewar wrote:
> kernel/cpu.c: don't use snprintf() for sysfs attrs
> 
> As per the documentation (Documentation/filesystems/sysfs.rst),
> snprintf() should not be used for formatting values returned by sysfs.
>

Sure. But then the security guys come along and send a patch saying
"sprintf is evil, always pass a buffer bound". Perhaps with a side of
"this code could get copy-pasted, better not promote the use of sprintf
more than strictly necessary".

Can we have a sysfs_sprintf() (could just be a macro that does sprintf)
to make it clear to the next reader that we know we're in a sysfs show
method? It would make auditing uses of sprintf() much easier.

>  static ssize_t cxacru_sysfs_showattr_LINE(u32 value, char *buf)
> @@ -275,8 +275,8 @@ static ssize_t cxacru_sysfs_showattr_LINE(u32 value, char 
> *buf)
>   "waiting", "initialising"
>   };
>   if (unlikely(value >= ARRAY_SIZE(str)))
> - return snprintf(buf, PAGE_SIZE, "%u\n", value);
> - return snprintf(buf, PAGE_SIZE, "%s\n", str[value]);
> + return sprintf(buf, "%u\n", value);
> + return sprintf(buf, "%s\n", str[value]);
>  }

Not this patch in particular, but in some cases the string being printed
is not immediately adjacent to the sprintf call, making it rather hard
to subsequently verify that yes, that string is certainly reasonably
bounded. If you really want to save the few bytes of code that is the
practical effect of eliding the PAGE_SIZE argument, how about a
sysfs_print_string(buf, str) helper that prints the string and appends a
newline; that's another argument saved. And again it would make it
obvious to a reader that that particular helper is only to be used in
sysfs show methods.

Rasmus


Re: [PATCH] checkpatch: Allow not using -f with files that are in git

2020-08-25 Thread Rasmus Villemoes
On 25/08/2020 02.09, Joe Perches wrote:
> If a file exists in git and checkpatch is used without the -f
> flag for scanning a file, then checkpatch will scan the file
> assuming it's a patch and emit:
> 
> ERROR: Does not appear to be a unified-diff format patch
> 
> Change the behavior to assume the -f flag if the file exists
> in git.

Heh, I read the patch subject to mean you introduced a way for subsystem
maintainers to prevent running checkpatch -f on their files, which I
think some would like ;)

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 79fc357b18cd..cdee7cfadc11 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -976,6 +976,16 @@ sub seed_camelcase_includes {
>   }
>  }
>  
> +sub git_is_single_file {
> + my ($filename) = @_;
> +
> + return 0 if ((which("git") eq "") || !(-e "$gitroot"));
> +
> + my $output = `${git_command} ls-files -- $filename`;
> + my $count = $output =~ tr/\n//;
> + return $count eq 1 && $output =~ m{^${filename}$};
> +}

Isn't that somewhat expensive to do for each file? Why not postpone that
check till we're about to complain that the file is not a diff (haven't
looked at how such a refactoring would look).

Rasmus


Re: [PATCH] lib: Convert test_printf.c to KUnit

2020-08-21 Thread Rasmus Villemoes
On 21/08/2020 13.37, Petr Mladek wrote:
> On Mon 2020-08-17 09:06:32, Rasmus Villemoes wrote:
>> On 17/08/2020 06.30, Arpitha Raghunandan wrote:
>>> Converts test lib/test_printf.c to KUnit.
>>> More information about KUnit can be found at
>>> https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html.
>>> KUnit provides a common framework for unit tests in the kernel.
>>
>> So I can continue to build a kernel with some appropriate CONFIG set to
>> y, boot it under virt-me, run dmesg and see if I broke printf? That's
>> what I do now, and I don't want to have to start using some enterprisy
>> framework.
> 
> I had the same concern. I have tried it.

Thanks for doing that and reporting the results.

> #> modprobe printf_kunit
> 
> produced the following in dmesg:
> 
> [   60.931175] printf_kunit: module verification failed: signature and/or 
> required key missing - tainting kernel
> [   60.942209] TAP version 14
> [   60.945197] # Subtest: printf-kunit-test
> [   60.945200] 1..1
> [   60.951092] ok 1 - selftest
> [   60.953414] ok 1 - printf-kunit-test
> 
> I could live with the above. Then I tried to break a test by the following 
> change:
> 
> 
> diff --git a/lib/printf_kunit.c b/lib/printf_kunit.c
> index 68ac5f9b8d28..1689dadd70a3 100644
> --- a/lib/printf_kunit.c
> +++ b/lib/printf_kunit.c
> @@ -395,7 +395,7 @@ ip4(struct kunit *kunittest)
> sa.sin_port = cpu_to_be16(12345);
> sa.sin_addr.s_addr = cpu_to_be32(0x7f01);
>  
> -   test(kunittest, "127.000.000.001|127.0.0.1", "%pi4|%pI4", 
> &sa.sin_addr, &sa.sin_addr);
> +   test(kunittest, "127-000.000.001|127.0.0.1", "%pi4|%pI4", 
> &sa.sin_addr, &sa.sin_addr);
> test(kunittest, "127.000.000.001|127.0.0.1", "%piS|%pIS", &sa, &sa);
> sa.sin_addr.s_addr = cpu_to_be32(0x01020304);
> test(kunittest, "001.002.003.004:12345|1.2.3.4:12345", "%piSp|%pISp", 
> &sa, &sa);
> 
> 
> It produced::
> 
> [   56.786858] TAP version 14
> [   56.787493] # Subtest: printf-kunit-test
> [   56.787494] 1..1
> [   56.788612] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:76
>Expected memcmp(test_buffer, expect, written) == 0, but
>memcmp(test_buffer, expect, written) == 1
>0 == 0
>vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote 
> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
> [   56.795433] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:76
>Expected memcmp(test_buffer, expect, written) == 0, but
>memcmp(test_buffer, expect, written) == 1
>0 == 0
>vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote 
> '127.000.000.001|127', expected '127-000.000.001|127'
> [   56.800909] # selftest: EXPECTATION FAILED at lib/printf_kunit.c:108
>Expected memcmp(p, expect, elen+1) == 0, but
>memcmp(p, expect, elen+1) == 1
>0 == 0
>kvasprintf(..., "%pi4|%pI4", ...) returned 
> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
> [   56.806497] not ok 1 - selftest
> [   56.806497] not ok 1 - printf-kunit-test
> 
> while the original code would have written the following error messages:
> 
> [   95.859225] test_printf: loaded.
> [   95.860031] test_printf: vsnprintf(buf, 256, "%pi4|%pI4", ...) wrote 
> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
> [   95.862630] test_printf: vsnprintf(buf, 6, "%pi4|%pI4", ...) wrote 
> '127.0', expected '127-0'
> [   95.864118] test_printf: kvasprintf(..., "%pi4|%pI4", ...) returned 
> '127.000.000.001|127.0.0.1', expected '127-000.000.001|127.0.0.1'
> [   95.866589] test_printf: failed 3 out of 388 tests
> 
> 
> Even the error output is acceptable for me.

Urgh. Yeah, perhaps, but the original is much more readable; it really
doesn't matter that a memcmp() fails to compare equal to 0, that's
merely how we figured out that the output was wrong.


I am just curious why
> the 2nd failure is different:
> 
>+ original code: vsnprintf(buf, 6, "%pi4|%pI4", ...) wrote '127.0', 
> expected '127-0'
>+ kunit code: vsnprintf(buf, 20, "%pi4|%pI4", ...) wrote 
> '127.000.000.001|127', expected &#

Re: [PATCH 0/4] -ffreestanding/-fno-builtin-* patches

2020-08-20 Thread Rasmus Villemoes
On 20/08/2020 19.56, Arvind Sankar wrote:
> On Thu, Aug 20, 2020 at 04:56:02PM +0200, Rasmus Villemoes wrote:
>> On 18/08/2020 23.41, Arvind Sankar wrote:
>>>
>>> Note that -fno-builtin-foo seems to mean slightly different things in
>>> clang and gcc. From experimentation, clang will neither optimize a call
>>> to foo, nor perform an optimization that introduces a call to foo. gcc
>>> will avoid optimizing calls to foo, but it can still generate new calls
>>> to foo while optimizing something else. Which means that
>>> -fno-builtin-{bcmp,stpcpy} only solves things for clang, not gcc. It's
>>> just that gcc doesn't seem to have implemented those optimizations.
>>>
>>
>> I think it's more than that. I've always read gcc's documentation
>>
>> '-fno-builtin'
>> '-fno-builtin-FUNCTION'
>>  Don't recognize built-in functions that do not begin with
>>  '__builtin_' as prefix. ...
>>
>>  GCC normally generates special code to handle certain built-in
>>  functions more efficiently; for instance, calls to 'alloca' may
>>  become single instructions which adjust the stack directly, and
>>  calls to 'memcpy' may become inline copy loops.
>>  ...
>>
>> to mean exactly that observed above and nothing more, i.e. that
>> -fno-builtin-foo merely means that gcc stops treating a call of a
>> function named foo to mean a call to a function implementing the
>> standard function by that name (and hence allows it to e.g. replace a
>> memcpy(d, s, 1) by byte load+store). It does not mean to prevent
>> emitting calls to foo, and I don't think it ever will - it's a bit sad
>> that clang has chosen to interpret these options differently.
> 
> That documentation is misleading, as it also goes on to say:
> "...nor can you change the behavior of the functions by linking with a
> different library"
> which implies that you _can_ change the behavior if you use the option,
> and which is what your "i.e." is saying as well.
> 
> My point is that this is not completely true: in gcc, foo by default is
> defined to be __builtin_foo, and -fno-builtin-foo simply removes this
> definition. So the effect is just that calls to foo in the original
> source will be left alone.

Yes, this is a much better way of putting it. And with -fbuiltin-foo in
effect, the compiler just needs to transform the code in some way as-if
the standard function by that name was called, which it can of course
decide to implement by emitting such a call, but it can also open-code
it - or synthesize it using other std functions.

> But in order for an optimization that introduces a new call to foo to be
> valid, foo _must_ have standard semantics: strchr(s,'\0') is not s +
> strlen(s) unless strlen has standard semantics.

Correct. So I agree that -fno-builtin-strlen should prevent the compiler
from generating calls to strlen() that don't appear in the code.

This is an oversight in
> gcc's optimizations: it converts to s + __builtin_strlen(s), which then
> (normally) becomes s + strlen(s).
> 
> Check out this horror: https://godbolt.org/z/a1r9fK
> 
> Clang will disable this optimization if -fno-builtin-strlen is
> specified.
>
> Clang's interpretation is more useful for embedded, since you can use
> -fno-builtin-foo and avoid calling __builtin_foo directly, and be
> guaranteed that there will be no calls to foo that you didn't write
> explicitly (outside of memcpy/memset/memcmp). In this case you are free
> to implement foo with non-standard semantics, or avoid implementing it
> altogether, and be reasonably confident that it will all work.

Yeah, except that the list of -fno-builtin-foo one would have to pass is
enourmous, so for targets with a somewhat wonky libc, I'd much rather be
able to do a blanket -fno-builtin, and then manually check their memcpy,
memset and memcmp implementations and opt back in with
-fbuiltin-mem{cpy,set,cmp} so that small constant-size memcpys do get
properly open-coded.

The advice in gcc's documentation of just #definining memcpy() to
__builtin_memcpy() doesn't work in the real world (for example it breaks
C++ code that uses std::memcpy(...)).

>> Thinking out load, it would be useful if both compilers grew
>>
>>   -fassume-provided-std-foo
>>
>> and
>>
>>   -fno-assume-provided-std-foo
>>
>> options to tell the compiler that a function named foo with standard
>> semantics can be assumed (or not) to be provided by the execution
>> environment; i.e. one half of what -f(no-)builtin-foo apparently does
&

Re: [PATCH 0/4] -ffreestanding/-fno-builtin-* patches

2020-08-20 Thread Rasmus Villemoes
On 18/08/2020 23.41, Arvind Sankar wrote:
> On Tue, Aug 18, 2020 at 01:58:51PM -0700, Nick Desaulniers wrote:
>> On Tue, Aug 18, 2020 at 1:27 PM Nick Desaulniers
>>  wrote:
>>>
>>> On Tue, Aug 18, 2020 at 1:24 PM Arvind Sankar  wrote:

 On Tue, Aug 18, 2020 at 12:13:22PM -0700, Linus Torvalds wrote:
> On Tue, Aug 18, 2020 at 12:03 PM H. Peter Anvin  wrote:
>>
>> I'm not saying "change the semantics", nor am I saying that playing
>> whack-a-mole *for a limited time* is unreasonable. But I would like to 
>> go back
>> to the compiler authors and get them to implement such a #pragma: "this
>> freestanding implementation *does* support *this specific library 
>> function*,
>> and you are free to call it."
>
> I'd much rather just see the library functions as builtins that always
> do the right thing (with the fallback being "just call the standard
> function").
>
> IOW, there's nothing wrong with -ffreestanding if you then also have
> __builtin_memcpy() etc, and they do the sane compiler optimizations
> for memcpy().
>
> What we want to avoid is the compiler making *assumptions* based on
> standard names, because we may implement some of those things
> differently.
>

 -ffreestanding as it stands today does have __builtin_memcpy and
 friends. But you need to then use #define memcpy __builtin_memcpy etc,
 which is messy and also doesn't fully express what you want. #pragma, or
 even just allowing -fbuiltin-foo options would be useful.
>>
>> I do really like the idea of -fbuiltin-foo.  For example, you'd specify:
>>
>> -ffreestanding -fbuiltin-bcmp
>>
>> as an example. `-ffreestanding` would opt you out of ALL libcall
>> optimizations, `-fbuiltin-bcmp` would then opt you back in to
>> transforms that produce bcmp.  That way you're informing the compiler
>> more precisely about the environment you'd be targeting.  It feels
>> symmetric to existing `-fno-` flags (clang makes -f vs -fno- pretty
>> easy when there is such symmetry).  And it's already convention that
>> if you specify multiple conflicting compiler flags, then the latter
>> one specified "wins."  In that sense, turning back on specific
>> libcalls after disabling the rest looks more ergonomic to me.
>>
>> Maybe Eli or David have thoughts on why that may or may not be as
>> ergonomic or possible to implement as I imagine?
>>
> 
> Note that -fno-builtin-foo seems to mean slightly different things in
> clang and gcc. From experimentation, clang will neither optimize a call
> to foo, nor perform an optimization that introduces a call to foo. gcc
> will avoid optimizing calls to foo, but it can still generate new calls
> to foo while optimizing something else. Which means that
> -fno-builtin-{bcmp,stpcpy} only solves things for clang, not gcc. It's
> just that gcc doesn't seem to have implemented those optimizations.
> 

I think it's more than that. I've always read gcc's documentation

'-fno-builtin'
'-fno-builtin-FUNCTION'
 Don't recognize built-in functions that do not begin with
 '__builtin_' as prefix. ...

 GCC normally generates special code to handle certain built-in
 functions more efficiently; for instance, calls to 'alloca' may
 become single instructions which adjust the stack directly, and
 calls to 'memcpy' may become inline copy loops.
 ...

to mean exactly that observed above and nothing more, i.e. that
-fno-builtin-foo merely means that gcc stops treating a call of a
function named foo to mean a call to a function implementing the
standard function by that name (and hence allows it to e.g. replace a
memcpy(d, s, 1) by byte load+store). It does not mean to prevent
emitting calls to foo, and I don't think it ever will - it's a bit sad
that clang has chosen to interpret these options differently.

Thinking out load, it would be useful if both compilers grew

  -fassume-provided-std-foo

and

  -fno-assume-provided-std-foo

options to tell the compiler that a function named foo with standard
semantics can be assumed (or not) to be provided by the execution
environment; i.e. one half of what -f(no-)builtin-foo apparently does
for clang currently.

And yes, the positive -fbuiltin-foo would also be quite useful in order
to get the compiler to recognize a few important functions (memcpy,
memcmp) while using -ffreestanding (or just plain -fno-builtin) to tell
it to avoid assuming anything about most std functions - I've worked on
a VxWorks target where snprintf() didn't have the correct "return what
would be written" semantics but rather behaved like the kernel's
non-standard scnprintf(), and who knows what other odd quirks that libc had.

Rasmus


Re: [PATCH v2] overflow: Add __must_check attribute to check_*() helpers

2020-08-17 Thread Rasmus Villemoes
On 17/08/2020 11.08, David Sterba wrote:
> On Sat, Aug 15, 2020 at 10:09:24AM -0700, Kees Cook wrote:
>>  
>> +/*
>> + * Allows for effectively applying __must_check to a macro so we can have
>> + * both the type-agnostic benefits of the macros while also being able to
>> + * enforce that the return value is, in fact, checked.
>> + */
>> +static inline bool __must_check __must_check_overflow(bool overflow)
>> +{
>> +return unlikely(overflow);
> 
> How does the 'unlikely' hint propagate through return? It is in a static
> inline so compiler has complete information in order to use it, but I'm
> curious if it actually does.

I wondered the same thing, but as I noted in a reply in the v1 thread,
that pattern is used in kernel/sched/, and the scheduler is a far more
critical path than anywhere these might be used, so if it's good enough
for kernel/sched/, it should be good enough here. I have no idea how one
could write a piece of non-trivial code to see if the hint actually
makes a difference.

> 
> In case the hint gets dropped, the fix would probably be
> 
> #define check_add_overflow(a, b, d) unlikely(__must_check_overflow(({ \
>   typeof(a) __a = (a);\
>   typeof(b) __b = (b);\
>   typeof(d) __d = (d);\
>   (void) (&__a == &__b);  \
>   (void) (&__a == __d);   \
>   __builtin_add_overflow(__a, __b, __d);  \
> })))
> 

Well, maybe, but I'd be a little worried that the !! that unlikely()
slabs on its argument may count as a use of that argument, hence
nullifying the __must_check which is the main point - the unlikely just
being something we can add for free while touching this code. Haven't
tested it, though.

Rasmus


Re: [PATCH 08/30] net: wireless: ath: carl9170: Mark 'ar9170_qmap' as __maybe_unused

2020-08-17 Thread Rasmus Villemoes
On 14/08/2020 17.14, Christian Lamparter wrote:
> On 2020-08-14 13:39, Lee Jones wrote:
>> 'ar9170_qmap' is used in some source files which include carl9170.h,
>> but not all of them.  Mark it as __maybe_unused to show that this is
>> not only okay, it's expected.
>>
>> Fixes the following W=1 kernel build warning(s)
> 
> Is this W=1 really a "must" requirement? I find it strange having
> __maybe_unused in header files as this "suggests" that the
> definition is redundant.

In this case it seems one could replace the table lookup with a

static inline u8 ar9170_qmap(u8 idx) { return 3 - idx; }

gcc doesn't warn about unused static inline functions (or one would have
a million warnings to deal with). Just my $0.02.

Rasmus


Re: [PATCH] lib: Convert test_printf.c to KUnit

2020-08-17 Thread Rasmus Villemoes
On 17/08/2020 06.30, Arpitha Raghunandan wrote:
> Converts test lib/test_printf.c to KUnit.
> More information about KUnit can be found at
> https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html.
> KUnit provides a common framework for unit tests in the kernel.

So I can continue to build a kernel with some appropriate CONFIG set to
y, boot it under virt-me, run dmesg and see if I broke printf? That's
what I do now, and I don't want to have to start using some enterprisy
framework.

> diff --git a/lib/test_printf.c b/lib/printf_kunit.c
> similarity index 45%
> rename from lib/test_printf.c
> rename to lib/printf_kunit.c
> index 7ac87f18a10f..68ac5f9b8d28 100644
> --- a/lib/test_printf.c
> +++ b/lib/printf_kunit.c
> @@ -5,6 +5,7 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -30,79 +31,61 @@
>  #define PAD_SIZE 16
>  #define FILL_CHAR '$'
>  
> -static unsigned total_tests __initdata;
> -static unsigned failed_tests __initdata;
> -static char *test_buffer __initdata;
> -static char *alloced_buffer __initdata;
> +static char *test_buffer;
> +static char *alloced_buffer;
>  
> -static int __printf(4, 0) __init
> -do_test(int bufsize, const char *expect, int elen,
> +static void __printf(5, 0)
> +do_test(struct kunit *kunittest, int bufsize, const char *expect, int elen,
>   const char *fmt, va_list ap)
>  {
>   va_list aq;
>   int ret, written;
>  
> - total_tests++;
> -
>   memset(alloced_buffer, FILL_CHAR, BUF_SIZE + 2*PAD_SIZE);
>   va_copy(aq, ap);
>   ret = vsnprintf(test_buffer, bufsize, fmt, aq);
>   va_end(aq);
>  
> - if (ret != elen) {
> - pr_warn("vsnprintf(buf, %d, \"%s\", ...) returned %d, expected 
> %d\n",
> + KUNIT_EXPECT_EQ_MSG(kunittest, ret, elen,
> + "vsnprintf(buf, %d, \"%s\", ...) returned %d, expected 
> %d\n",
>   bufsize, fmt, ret, elen);
> - return 1;
> - }


IIRC, some of these early returns are required to ensure the following
checks do not fail (as in, potentially crash the kernel) simply because
they go off into the weeds. Please double-check that they are all safe
to continue to perform (though, another reason I might have put them in
is to simply avoid lots of useless collateral).


> - if (memchr_inv(alloced_buffer, FILL_CHAR, PAD_SIZE)) {
> + KUNIT_EXPECT_EQ_MSG(kunittest, memchr_inv(alloced_buffer, FILL_CHAR, 
> PAD_SIZE), NULL,

> - if (memchr_inv(test_buffer, FILL_CHAR, BUF_SIZE + PAD_SIZE)) {
> + KUNIT_EXPECT_FALSE_MSG(kunittest,

> - if (memchr_inv(test_buffer + written + 1, FILL_CHAR, BUF_SIZE + 
> PAD_SIZE - (written + 1))) {
> + KUNIT_EXPECT_FALSE_MSG(kunittest,
> + memchr_inv(test_buffer + written + 1, FILL_CHAR, 
> BUF_SIZE + PAD_SIZE - (written + 1))

Why the inconsistency in what a memchr_inv != NULL check gets converted to?


>  
> -static void __printf(3, 4) __init
> -__test(const char *expect, int elen, const char *fmt, ...)
> +static void __printf(4, 5)
> +__test(struct kunit *kunittest, const char *expect, int elen, const char 
> *fmt, ...)
>  {
>   va_list ap;
>   int rand;
>   char *p;
>  
> - if (elen >= BUF_SIZE) {
> - pr_err("error in test suite: expected output length %d too 
> long. Format was '%s'.\n",
> -elen, fmt);
> - failed_tests++;
> - return;
> - }
> + KUNIT_EXPECT_LT_MSG(kunittest, elen, BUF_SIZE,
> + "error in test suite: expected output length %d too 
> long. Format was '%s'.\n",
> + elen, fmt);

And it's ok to continue with the tests when the test suite itself is
buggy because? [*]

>   va_start(ap, fmt);
>  
> @@ -112,49 +95,46 @@ __test(const char *expect, int elen, const char *fmt, 
> ...)
>* enough and 0), and then we also test that kvasprintf would
>* be able to print it as expected.
>*/
> - failed_tests += do_test(BUF_SIZE, expect, elen, fmt, ap);
> + do_test(kunittest, BUF_SIZE, expect, elen, fmt, ap);
>   rand = 1 + prandom_u32_max(elen+1);
>   /* Since elen < BUF_SIZE, we have 1 <= rand <= BUF_SIZE. */
> - failed_tests += do_test(rand, expect, elen, fmt, ap);

[*] Certainly this invariant gets violated, so we (may) provide do_test
with a buffer size larger than, well, BUF_SIZE.

>  
> -#define test(expect, fmt, ...)   \
> - __test(expect, strlen(expect), fmt, ##__VA_ARGS__)
> +#define test(kunittest, expect, fmt, ...)
> \
> + __test(kunittest, expect, strlen(expect), fmt, ##__VA_ARGS__)
>  
> -static void __init
> -test_basic(void)
> +static void
> +test_basic(struct kunit *kunittest)
>  {
>   /* Work around annoying "warning: zero-length gnu_printf format 
> string". */
>   char nul = '\0';
>  
> - test("", &nul);
> - test("100%"

Re: [PATCH] overflow: Add __must_check attribute to check_*() helpers

2020-08-13 Thread Rasmus Villemoes
On 13/08/2020 13.23, Matthew Wilcox wrote:
> On Wed, Aug 12, 2020 at 02:51:52PM -0700, Kees Cook wrote:
>> +/*
>> + * Allows to effectively us apply __must_check to a macro so we can have
>> + * both the type-agnostic benefits of the macros while also being able to
>> + * enforce that the return value is, in fact, checked.
>> + */
>> +static inline bool __must_check __must_check_bool(bool condition)
>> +{
>> +return unlikely(condition);
>> +}
> 
> I'm fine with the concept, but this is a weirdly-generically-named
> function that has a very specific unlikely() in it.  So I'd call
> this __must_check_overflow() and then it's obvious that overflow is
> unlikely(), whereas it's not obvious that __must_check_bool() is going
> to be unlikely().

Incidentally, __must_check_overflow was what was actually Suggested-by
me - though I didn't think too hard about that name, I certainly agree
with your reasoning.

I still don't know if (un)likely annotations actually matter when used
this way, but at least the same pattern is used in kernel/sched/, so
probably.

Rasmus


Re: [PATCH RT 1/6] signal: Prevent double-free of user struct

2020-08-13 Thread Rasmus Villemoes
On 13/08/2020 03.45, Steven Rostedt wrote:
> 5.4.54-rt33-rc1 stable review patch.
> If anyone has any objections, please let me know.
>

No objections, quite the contrary. I think this should also be applied
to 4.19-rt:

Commit fda31c50292a is also in 4.19.y (as 797479da0ae9), since 4.19.112
and hence also 4.19.112-rt47. For a while we've tried to track down a
hang that at least sometimes manifests quite similarly

refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 14 at lib/refcount.c:280 refcount_dec_not_one+0xc0/0xd8
...
Call Trace:
[cf45be10] [c0238258] refcount_dec_not_one+0xc0/0xd8 (unreliable)
[cf45be20] [c02383c8] refcount_dec_and_lock_irqsave+0x20/0xa4
[cf45be40] [c0024a70] free_uid+0x2c/0xa0
[cf45be60] [c00384f0] put_cred_rcu+0x58/0x8c
[cf45be70] [c005f048] rcu_cpu_kthread+0x364/0x49c
[cf45bee0] [c003a0d0] smpboot_thread_fn+0x21c/0x29c
[cf45bf10] [c0036464] kthread+0xe0/0x10c
[cf45bf40] [c000f1cc] ret_from_kernel_thread+0x14/0x1c

But our reproducer is rather complicated and involves cutting power to
neighbouring boards, and takes many minutes to trigger. So I tried
Daniel's reproducer

  sigwaittest -t -a -p 98

and almost immediately got a trace much more similar to the one in the
commit message

refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 1526 at lib/refcount.c:280
refcount_dec_not_one+0xc0/0xd8
...
Call Trace:
[cebc9e00] [c0238258] refcount_dec_not_one+0xc0/0xd8 (unreliable)
[cebc9e10] [c02383c8] refcount_dec_and_lock_irqsave+0x20/0xa4
[cebc9e30] [c0024a70] free_uid+0x2c/0xa0
[cebc9e50] [c002574c] dequeue_signal+0x90/0x1a4
[cebc9e80] [c0028f74] sys_rt_sigtimedwait+0x24c/0x288
[cebc9f40] [c000f12c] ret_from_syscall+0x0/0x40

With this patch applied, the sigwaittest has now run for 10 minutes
without problems.

I'll have to run some more tests with our reproducer to see if it really
is the same issue, but even if not, the fact that the sigwaittest fails
should be enough to put this in 4.19-rt.

Three requests (in order of importance):

* pull this into 4.19-rt
* add a note about the sigwaittest reproducer to the commit log
* do publish the rt-rcs in some git repository; that makes it a lot
easier to cherry-pick and test patches

Thanks,
Rasmus


Re: [PATCH] overflow: Add __must_check attribute to check_*() helpers

2020-08-12 Thread Rasmus Villemoes
On 12/08/2020 23.51, Kees Cook wrote:
> Since the destination variable of the check_*_overflow() helpers will
> contain a wrapped value on failure, it would be best to make sure callers
> really did check the return result of the helper. Adjust the macros to use
> a bool-wrapping static inline that is marked with __must_check. This means
> the macros can continue to have their type-agnostic behavior while gaining
> the function attribute (that cannot be applied directly to macros).
> 
> Suggested-by: Rasmus Villemoes 
> Signed-off-by: Kees Cook 
> ---
>  include/linux/overflow.h | 51 +++-
>  1 file changed, 30 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/overflow.h b/include/linux/overflow.h
> index 93fcef105061..ef7d538c2d08 100644
> --- a/include/linux/overflow.h
> +++ b/include/linux/overflow.h
> @@ -43,6 +43,16 @@
>  #define is_non_negative(a) ((a) > 0 || (a) == 0)
>  #define is_negative(a) (!(is_non_negative(a)))
>  
> +/*
> + * Allows to effectively us apply __must_check to a macro so we can have

word ordering?

> + * both the type-agnostic benefits of the macros while also being able to
> + * enforce that the return value is, in fact, checked.
> + */
> +static inline bool __must_check __must_check_bool(bool condition)
> +{
> + return unlikely(condition);
> +}
> +
>  #ifdef COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW
>  /*
>   * For simplicity and code hygiene, the fallback code below insists on
> @@ -52,32 +62,32 @@
>   * alias for __builtin_add_overflow, but add type checks similar to
>   * below.
>   */
> -#define check_add_overflow(a, b, d) ({   \
> +#define check_add_overflow(a, b, d) __must_check_bool(({ \
>   typeof(a) __a = (a);\
>   typeof(b) __b = (b);\
>   typeof(d) __d = (d);\
>   (void) (&__a == &__b);  \
>   (void) (&__a == __d);   \
>   __builtin_add_overflow(__a, __b, __d);  \
> -})
> +}))

Sorry, I meant to send this before your cooking was done but forgot
about it again. Not a big deal, but it occurred to me it might be better
to rename the existing check_*_overflow to __check_*_overflow (in both
branches of the COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW), and then

#define check_*_overflow(a, b, d)
__must_check_bool(__check_*_overflow(a, b, d))

Mostly because it gives less whitespace churn, but it might also be
handy to have the dunder versions available (if nothing else then
perhaps in some test code).

But as I said, no biggie, I'm fine either way. Now I'm just curious if
0-day is going to find some warning introduced by this :)

Rasmus


Re: [PATCH linux-5.2.y-rt only] hrtimer: correct the logic for grab expiry lock

2020-08-12 Thread Rasmus Villemoes
On 12/08/2020 12.50, zhantao.t...@windriver.com wrote:
> From: Zhantao Tang 
> 
> In commit: 47b6de0b7f22 ("hrtimer: Add a missing bracket and hide 
> `migration_base' on !SMP")
> a inline function is_migration_base() is introduced. But
> the logic of the hrtimer_grab_expiry_lock was changed.
> 
> This patch is to correct it.
> 

Yup, same patch sent back in April, which also had a fixes tag for 5.2.

https://lore.kernel.org/lkml/20200428144026.5882-1-rasmus.villem...@prevas.dk/

It got picked up for 4.19-rt, dunno why it wasn't for 5.2-rt.

Rasmus


Re: [RFC] saturate check_*_overflow() output?

2020-08-05 Thread Rasmus Villemoes
On 04/08/2020 21.23, Kees Cook wrote:
> On Tue, Aug 04, 2020 at 08:11:45AM +0200, Rasmus Villemoes wrote:

>> What we might do, to deal with the "caller fails to check the result",
>> is to add a
>>
>> static inline bool __must_check must_check_overflow(bool b) { return
>> unlikely(b); }
>>
>> and wrap all the final "did it overflow" results in that one - perhaps
>> also for the __builtin_* cases, I don't know if those are automatically
>> equipped with that attribute. [I also don't know if gcc propagates
>> likely/unlikely out to the caller, but it shouldn't hurt to have it
>> there and might improve code gen if it does.]
> 
> (What is the formal name for the ({ ...; return_value; }) C construct?)

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

> Will that work as a macro return value? If so, that's extremely useful.

Yes and no. Just wrapping the last expression in the statement
expression with my must_check_overflow(), as in

@@ -67,17 +72,18 @@
typeof(d) __d = (d);\
(void) (&__a == &__b);  \
(void) (&__a == __d);   \
-   __builtin_sub_overflow(__a, __b, __d);  \
+   must_check_overflow(__builtin_sub_overflow(__a, __b, __d)); \
 })

does not appear to work. For some reason, this can't be (ab)used to
overrule the __must_check this simply:

  - kstrtoint(a, b, c);
  + ({ kstrtoint(a, b, c); });

still gives a warning for kstrtoint(). But failing to use the result of
check_sub_overflow() as patched above does not give a warning.

I'm guessing gcc has some internal very early simplification that
replaces single-expression statement-exprs with just that expression,
and the warn-unused-result triggers later. But as soon as the
statement-expr becomes a little non-trivial (e.g. above), my guess is
that the whole thing gets assigned to some internal "variable"
representing the result, and that assignment then counts as a use of the
return value from must_check_overflow() - cc'ing Segher, as he usually
knows these details.

Anyway, we don't need to apply it to the last expression inside ({}), we
can just pass the whole ({}) to must_check_overflow() as in

-#define check_sub_overflow(a, b, d) ({ \
+#define check_sub_overflow(a, b, d) must_check_overflow(({ \
typeof(a) __a = (a);\
typeof(b) __b = (b);\
typeof(d) __d = (d);\
(void) (&__a == &__b);  \
(void) (&__a == __d);   \
__builtin_sub_overflow(__a, __b, __d);  \
-})
+}))

and that's even more natural for the fallback cases which would be

 #define check_sub_overflow(a, b, d)\
+   must_check_overflow(\
__builtin_choose_expr(is_signed_type(typeof(a)),\
__signed_sub_overflow(a, b, d), \
-   __unsigned_sub_overflow(a, b, d))
+   __unsigned_sub_overflow(a, b, d)))

(in all cases with some whitespace reflowing).

Rasmus


Re: [RFC] saturate check_*_overflow() output?

2020-08-03 Thread Rasmus Villemoes
On 03/08/2020 20.29, Kees Cook wrote:
> Hi,
> 
> I wonder if we should explicitly saturate the output of the overflow
> helpers as a side-effect of overflow detection? 

Please no.

(That way the output
> is never available with a "bad" value, if the caller fails to check the
> result or forgets that *d was written...) since right now, *d will hold
> the wrapped value.

Exactly. I designed the fallback ones so they would have the same
semantics as when using gcc's __builtin_* - though with the "all
operands have same type" restriction, since it would be completely
unwieldy to handle stuff like (s8) + (u64) -> (s32) in macros.

> Also, if we enable arithmetic overflow detection sanitizers, we're going
> to trip over the fallback implementation (since it'll wrap and then do
> the overflow test in the macro).

Huh? The fallback code only ever uses unsigned arithmetic, precisely to
avoid triggering such warnings. Or are you saying there are some
sanitizers out there which also warn for, say, (~0u) + 1u? Yes,
detecting overflow/underflow for a (s32)-(s32)->(s32) without relying on
-fwrapv is a bit messy, but it's done and AFAIK works just fine even
with UBSAN enabled.


What we might do, to deal with the "caller fails to check the result",
is to add a

static inline bool __must_check must_check_overflow(bool b) { return
unlikely(b); }

and wrap all the final "did it overflow" results in that one - perhaps
also for the __builtin_* cases, I don't know if those are automatically
equipped with that attribute. [I also don't know if gcc propagates
likely/unlikely out to the caller, but it shouldn't hurt to have it
there and might improve code gen if it does.]

Rasmus

PS: Another reason not to saturate is that there are two extreme values,
and choosing between them makes the code very messy (especially when
using the __builtins). 5u-10u should saturate to 0u, not UINT_MAX, and
even for for underflowing a signed computation like INT_MIN + (-7); it
makes no sense for that to saturate to INT_MAX.


Re: [PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

2020-07-10 Thread Rasmus Villemoes
On 10/07/2020 10.30, Cyrill Gorcunov wrote:
> On Fri, Jul 10, 2020 at 09:56:31AM +0200, Rasmus Villemoes wrote:
>> The ability to check open file descriptions for equality (without
>> resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
>> useful outside of the checkpoint/restore use case - for example,
>> systemd uses kcmp() to deduplicate the per-service file descriptor
>> store.
>>
>> Make it possible to have the kcmp() syscall without the full
>> CONFIG_CHECKPOINT_RESTORE.
>>
>> Signed-off-by: Rasmus Villemoes 
>> ---
>> I deliberately drop the ifdef in the eventpoll.h header rather than
>> replace with KCMP_SYSCALL; it's harmless to declare a function that
>> isn't defined anywhere.
> 
> Could you please point why setting #fidef KCMP_SYSCALL in eventpoll.h
> is not suitable?

It's just from a general "avoid ifdef clutter if possible" POV. The
conditional declaration of the function doesn't really serve any purpose.

> Still the overall idea is fine for me, thanks!>
> Reviewed-by: Cyrill Gorcunov 

Thank you.

Rasmus




[PATCH] kcmp: add separate Kconfig symbol for kcmp syscall

2020-07-10 Thread Rasmus Villemoes
The ability to check open file descriptions for equality (without
resorting to unreliable fstat() and fcntl(F_GETFL) comparisons) can be
useful outside of the checkpoint/restore use case - for example,
systemd uses kcmp() to deduplicate the per-service file descriptor
store.

Make it possible to have the kcmp() syscall without the full
CONFIG_CHECKPOINT_RESTORE.

Signed-off-by: Rasmus Villemoes 
---
I deliberately drop the ifdef in the eventpoll.h header rather than
replace with KCMP_SYSCALL; it's harmless to declare a function that
isn't defined anywhere.

 fs/eventpoll.c|  4 ++--
 include/linux/eventpoll.h |  2 --
 init/Kconfig  | 11 +++
 kernel/Makefile   |  2 +-
 4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 12eebcdea9c8..b0313ce2df73 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1064,7 +1064,7 @@ static struct epitem *ep_find(struct eventpoll *ep, 
struct file *file, int fd)
return epir;
 }
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
+#ifdef CONFIG_KCMP_SYSCALL
 static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long 
toff)
 {
struct rb_node *rbp;
@@ -1106,7 +1106,7 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, 
int tfd,
 
return file_raw;
 }
-#endif /* CONFIG_CHECKPOINT_RESTORE */
+#endif /* CONFIG_KCMP_SYSCALL */
 
 /**
  * Adds a new entry to the tail of the list in a lockless way, i.e.
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 8f000fada5a4..aa799295e373 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -18,9 +18,7 @@ struct file;
 
 #ifdef CONFIG_EPOLL
 
-#ifdef CONFIG_CHECKPOINT_RESTORE
 struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long 
toff);
-#endif
 
 /* Used to initialize the epoll bits inside the "struct file" */
 static inline void eventpoll_init_file(struct file *file)
diff --git a/init/Kconfig b/init/Kconfig
index 0498af567f70..95e9486d4217 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1158,9 +1158,20 @@ config NET_NS
 
 endif # NAMESPACES
 
+config KCMP_SYSCALL
+   bool "kcmp system call"
+   help
+ Enable the kcmp system call, which allows one to determine
+ whether to tasks share various kernel resources, for example
+ whether they share address space, or if two file descriptors
+ refer to the same open file description.
+
+ If unsure, say N.
+
 config CHECKPOINT_RESTORE
bool "Checkpoint/restore support"
select PROC_CHILDREN
+   select KCMP_SYSCALL
default n
help
  Enables additional kernel features in a sake of checkpoint/restore.
diff --git a/kernel/Makefile b/kernel/Makefile
index f3218bc5ec69..3daedba2146a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -49,7 +49,7 @@ obj-y += rcu/
 obj-y += livepatch/
 obj-y += dma/
 
-obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
+obj-$(CONFIG_KCMP_SYSCALL) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
-- 
2.23.0



Re: printk of non NULL terminated strings ?

2020-07-09 Thread Rasmus Villemoes
On 09/07/2020 15.26, Joakim Tjernlund wrote:
> On Thu, 2020-07-09 at 14:56 +0200, Andreas Schwab wrote:
>> CAUTION: This email originated from outside of the organization. Do not 
>> click links or open attachments unless you recognize the sender and know the 
>> content is safe.
>>
>>
>> On Jul 09 2020, Joakim Tjernlund wrote:
>>
>>> Is there a format (or other function) that lets me
>>> print strings without an \0 terminator using an explicit length arg instead?
>>
>> Use the precision.
> 
> Looking at that now but have a hard time figuring how to use it, can you give 
> me an example?

Exactly as you'd do in userspace:

  printf("%.*s\n", len, buf)

Of course, vsnprintf() will still stop if it encounters a nul byte
within those first len bytes in buf. And you need len to have type int,
so you may need a cast if you have a size_t or ssize_t or whatnot.

Rasmus


Re: [sched] c3a340f7e7: invalid_opcode:#[##]

2020-06-30 Thread Rasmus Villemoes
On 30/06/2020 14.46, Peter Zijlstra wrote:
> On Mon, Jun 29, 2020 at 08:31:27AM +0800, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed the following commit (built with gcc-4.9):
>>
>> commit: c3a340f7e7eadac7662ab104ceb16432e5a4c6b2 ("sched: Have 
>> sched_class_highest define by vmlinux.lds.h")
> 
>> [1.840970] kernel BUG at kernel/sched/core.c:6652!
> 
> W T H
> 
> $ readelf -Wa defconfig-build/vmlinux | grep sched_class
> 62931: c1e62d20 0 NOTYPE  GLOBAL DEFAULT2 __begin_sched_classes
> 65736: c1e62f4096 OBJECT  GLOBAL DEFAULT2 stop_sched_class
> 71813: c1e62dc096 OBJECT  GLOBAL DEFAULT2 fair_sched_class
> 78689: c1e62d4096 OBJECT  GLOBAL DEFAULT2 idle_sched_class
> 78953: c1e62fa0 0 NOTYPE  GLOBAL DEFAULT2 __end_sched_classes
> 79090: c1e62e4096 OBJECT  GLOBAL DEFAULT2 rt_sched_class
> 79431: c1e62ec096 OBJECT  GLOBAL DEFAULT2 dl_sched_class
> 
> $ printf "%d\n" $((0xc1e62dc0 - 0xc1e62d40))
> 128
> 
> So even though the object is 96 bytes in size, has an explicit 32 byte
> alignment, the array ends up with a stride of 128 bytes !?!?!
> 
> Consistently so with GCC-4.9. Any other GCC I tried does the sane thing.

Does that include gcc 4.8, or is it only "anything newer than 4.9"?

>
> Full patch included below.
> 
> Anybody any clue wth 4.9 is doing crazy things like this?

Perhaps
https://gcc.gnu.org/onlinedocs/gcc-4.9.4/gcc/Variable-Attributes.html#Variable-Attributes:

  When used on a struct, or struct member, the aligned attribute can
only increase the alignment; in order to decrease it, the packed
attribute must be specified as well. When used as part of a typedef, the
aligned attribute can both increase and decrease alignment, and
specifying the packed attribute generates a warning.

is part of the explanation. But this is seriously weird. I don't know
which .config you or the buildbot used, but I took an i386_defconfig
with SMP=n to get a small enough struct sched_class (and disable
retpoline stuff), added

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81640fe0eae8..53c0d3ba62ba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -71,6 +71,13 @@ unsigned int sysctl_sched_rt_period = 100;

 __read_mostly int scheduler_running;

+void foo(void)
+{
+   extern void bar(int);
+   bar(sizeof(struct sched_class));
+   bar(_Alignof(struct sched_class));
+}
+
 /*
  * part of the period that we allow rt tasks to run in us.
  * default: 0.95s

and apparently _Alignof is only 16:

 2c90 :
2c90:   55  push   %ebp
2c91:   b8 60 00 00 00  mov$0x60,%eax
2c96:   89 e5   mov%esp,%ebp
2c98:   e8 fc ff ff ff  call   2c99 
2c99: R_386_PC32bar
2c9d:   b8 10 00 00 00  mov$0x10,%eax
2ca2:   e8 fc ff ff ff  call   2ca3 
2ca3: R_386_PC32bar

Neverthess, readelf -S --wide kernel/sched/fair.o:

Section Headers:
  [Nr] Name  TypeAddr OffSize   ES Flg
Lk Inf Al

  [35] __fair_sched_class PROGBITS 002980 60 00   A
0   0 64

so the section it was put in has an alignment of 64. The generated
assembly is indeed

.globl  fair_sched_class
.section__fair_sched_class,"a",@progbits
.align 64

/me goes brew coffee


Re: [PATCH v3 bpf-next 4/8] printk: add type-printing %pT format specifier which uses BTF

2020-06-23 Thread Rasmus Villemoes
On 23/06/2020 14.07, Alan Maguire wrote:
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index fc8f03c..8f8f5d2 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -618,4 +618,20 @@ static inline void print_hex_dump_debug(const char 
> *prefix_str, int prefix_type,
>  #define print_hex_dump_bytes(prefix_str, prefix_type, buf, len)  \
>   print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true)
>  
> +/**
> + * struct btf_ptr is used for %pT (typed pointer) display; the
> + * additional type string/BTF id are used to render the pointer
> + * data as the appropriate type.
> + */
> +struct btf_ptr {
> + void *ptr;
> + const char *type;
> + u32 id;
> +};
> +
> +#define  BTF_PTR_TYPE(ptrval, typeval) \
> + (&((struct btf_ptr){.ptr = ptrval, .type = #typeval}))
> +
> +#define BTF_PTR_ID(ptrval, idval) \
> + (&((struct btf_ptr){.ptr = ptrval, .id = idval}))

Isn't there some better place to put this than printk.h? Anyway, you
probably want the ptr member to be "const void*", to avoid "... discards
const qualifier" warnings when somebody happens to have a "const struct
foobar *".

>  #endif
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 259e558..c0d209d 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -44,6 +44,7 @@
>  #ifdef CONFIG_BLOCK
>  #include 
>  #endif
> +#include 
>  
>  #include "../mm/internal.h"  /* For the trace_print_flags arrays */
>  
> @@ -2092,6 +2093,87 @@ char *fwnode_string(char *buf, char *end, struct 
> fwnode_handle *fwnode,
>   return widen_string(buf, buf - buf_start, end, spec);
>  }
>  
> +#define btf_modifier_flag(c) (c == 'c' ? BTF_SHOW_COMPACT :  \
> +  c == 'N' ? BTF_SHOW_NONAME :   \
> +  c == 'x' ? BTF_SHOW_PTR_RAW :  \
> +  c == 'u' ? BTF_SHOW_UNSAFE : \
> +  c == '0' ? BTF_SHOW_ZERO : 0)
> +
> +static noinline_for_stack
> +char *btf_string(char *buf, char *end, void *ptr, struct printf_spec spec,
> +  const char *fmt)
> +{
> + struct btf_ptr *bp = (struct btf_ptr *)ptr;
> + u8 btf_kind = BTF_KIND_TYPEDEF;
> + const struct btf_type *t;
> + const struct btf *btf;
> + char *buf_start = buf;
> + const char *btf_type;
> + u64 flags = 0, mod;
> + s32 btf_id;
> +
> + if (check_pointer(&buf, end, ptr, spec))
> + return buf;
> +
> + if (check_pointer(&buf, end, bp->ptr, spec))
> + return buf;
> +
> + while (isalnum(*fmt)) {
> + mod = btf_modifier_flag(*fmt);
> + if (!mod)
> + break;
> + flags |= mod;
> + fmt++;
> + }
> +
> + btf = bpf_get_btf_vmlinux();

AFAICT, this function is only compiled if CONFIG_BPF=y and
CONFIG_BPF_SYSCALL=y, and I don't see any static inline stub defined
anywhere. Have you built the kernel with one or both of those turned off?

Rasmus


Re: [PATCH v3 00/21] dynamic_debug cleanups, query features, export

2020-06-17 Thread Rasmus Villemoes
On 17/06/2020 18.25, Jim Cromie wrote:
> this is v3, changes from previous:
>  - moved non-controversial commits up front, refactors to help this.
>  - a few more minor cleanups
>  - left out the WIP patches
>  - export ddebug_exec_queries()
>  - accept file=foo:func only, not module:foo
>  - varname changes
>  
> v2: 
> https://lore.kernel.org/lkml/20200613155738.2249399-1-jim.cro...@gmail.com/
> v1: https://lore.kernel.org/lkml/20200605162645.289174-1-jim.cro...@gmail.com/
> 
> Patchset starts with 11 cleanups;
>  - change section name from vague "__verbose" to "__dyndbg"
>  - cleaner docs, drop obsolete comment & useless debug prints,
>refine verbosity, fix a BUG_ON, ram reporting miscounts. etc..

So I haven't been following too closely, but I'm also a bit skeptical
about the new custom flag thing. OTOH, I'd really like to see those
first cleanups go in soon, especially patch 6 - which not only makes the
ram use a bit more accurate, it also avoids ~1 calls of strlen() on
cache-cold memory during boot.

So, FWIW, you have my Acked-by for patches 1 through 11, and I hope
those can be picked up before the next merge window. But the remaining
ones seem to still require some discussion.

Rasmus


Re: WIP generic module->debug_flags and dynamic_debug

2020-06-11 Thread Rasmus Villemoes
On 10/06/2020 20.32, jim.cro...@gmail.com wrote:
> so Ive got a WIP / broken / partial approach to giving all modules a
> u32 flagset,
> and enabling pr_debug based upon it.  I leave out the "pr_debug_typed(
> bitpos )" for now.  For Stanimir, bits 1,2,3 could be high, middle,
> low.
> 
> ATM its broken on my lack of container_of() skills.
> 
> Im trying to use it to get a struct module* using its name value thats
> been copied
> into a ddebug_table member.
> 
> Im relying on
> cdf6d006968  dynamic_debug: don't duplicate modname in ddebug_add_module
> to have the same value in both structs
> 
> but Im clearly missing a few things
> besides the likely future trouble with .rodata builtin modules
> (after compile prob solved)
> 
> It seems container_of wants me to use struct ddebug_table instead,
> but I dont want a *ddebug_table.
> Ive blindly guessed at adding & and * to 1st param, w/o understanding.
> 
> can anyone diagnose my problem ?

Sorry, I have not the faintest idea of what you're trying to achieve.
Can you spell that out?

Rasmus


must populate_rootfs be synchronous?

2020-06-10 Thread Rasmus Villemoes
I have worked on several boards where there is a rather "aggressive"
external (gpio) watchdog, with timeouts between 250 and 800 ms. Combined
with a rather slow CPU and memory, these boards often fail to make it
through populate_rootfs, since unpacking the initramfs is rather
time-consuming, and we haven't gotten to get the watchdog driver handle
the watchdog. [No, GPIO_WATCHDOG_ARCH_INITCALL doesn't help, at least
not for all cases, probably because the device isn't "discovered" until
mpc83xx_declare_of_platform_devices runs, which is at device_initcall time].

So, those boards currently use some rather ugly patches to make them
boot. I'd like to get rid of those.

I assume there's a good reason populate_rootfs runs between fs_initcall
and device_initcalls - perhaps one of the latter wants some firmware or
do a request_module?

But, would it be possible to throw most of populate_rootfs into a work
item, add some globally visible DECLARE_COMPLETION(initramfs_unpacked)
which is complete_all'ed at the end, and then in the (I assume)
relatively few places that might need to look at the filesystem add a
wait_for_completion(initramfs_unpacked) - including of course right
before the console_on_rootfs() call in kernel_init_freeable() (so also
before we start asking whether there is /init or not).

Rasmus


PS: This is the slowness, .7 seconds to unpack 4MB - it got a bit better
when switching to lz4 compression, but still in the few hundreds of
milliseconds range, which is way too much given the external watchdog's
requirements:

 [0.047970] Trying to unpack rootfs image as initramfs...
 [0.768516] Freeing initrd memory: 3972K


Re: [PATCH resend] fs/namei.c: micro-optimize acl_permission_check

2020-06-08 Thread Rasmus Villemoes
On 07/06/2020 21.48, Linus Torvalds wrote:
> On Sun, Jun 7, 2020 at 9:37 AM Linus Torvalds
>  wrote:
>>
>>> That will kinda work, except you do that mask &= MAY_RWX before
>>> check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits.
>>
>> Good catch.
> 
> With the change to not clear the non-rwx bits in general, the owner
> case now wants to do the masking, and then the "shift left by 6"
> modification makes no sense since it only makes for a bigger constant
> (the only reason to do the shift-right was so that the bitwise not of
> the i_mode could be done in parallel with the shift, but with the
> masking that instruction scheduling optimization becomes kind of
> immaterial too). So I modified that patch to not bother, and add a
> comment about MAY_NOT_BLOCK.
> 
> And since I was looking at the MAY_NOT_BLOCK logic, it was not only
> not mentioned in comments, it also had some really confusing code
> around it.
> 
> The posix_acl_permission() looked like it tried to conserve that bit,
> which is completely wrong. It wasn't a bug only for the simple reason
> that the only two call-sites had either explicitly cleared the bit
> when calling, or had tested that the bit wasn't set in the first
> place.
> 
> So as a result, I wrote a second patch to clear that confusion up.
> 
> Rasmus, say the word and I'll mark you for authorship on the first one.

It might be a bit confusing with me mentioned in the third person and
then also author, and it's really mostly your patch, so reported-by is
fine with me. But it's up to you.

> Comments? Can you find something else wrong here, or some other fixup to do?

No, I think it's ok. I took a look at the disassembly and it looks fine.
There's an extra push/pop of %r14 [that's where gcc computes mode>>3,
then CSE allows it to do cmovne %r14d,%ebx after in_group_p), so the
owner case gets slightly penalized. I think/hope the savings from
avoiding the in_group_p should compensate for that - any absolute path
open() by non-root saves at least two in_group_p. YMMV.

Rasmus


Re: [PATCH resend] fs/namei.c: micro-optimize acl_permission_check

2020-06-07 Thread Rasmus Villemoes
On 05/06/2020 22.18, Linus Torvalds wrote:
> On Fri, Jun 5, 2020 at 7:23 AM Rasmus Villemoes
>  wrote:
>>
>> +   /*
>> +* If the "group" and "other" permissions are the same,
>> +* there's no point calling in_group_p() to decide which
>> +* set to use.
>> +*/
>> +   if mode >> 3) ^ mode) & 7) && in_group_p(inode->i_gid))
>> mode >>= 3;
> 
> Ugh. Not only is this ugly, but it's not even the best optimization.
>
> We don't care that group and other match exactly. We only care that
> they match in the low 3 bits of the "mask" bits.

Yes, I did think about that, but I thought this was the more obviously
correct approach, and that in practice one only sees the 0X44 and 0X55
cases.

> So if we want this optimization - and it sounds worth it - I think we
> should do it right. But I also think it should be written more
> legibly.
> 
> And the "& 7" is the same "& (MAY_READ | MAY_WRITE | MAY_EXEC)" we do later.
> 
> In other words, if we do this, I'd like it to be done even more
> aggressively, but I'd also like the end result to be a lot more
> readable and have more comments about why we do that odd thing.
> 
> Something like this *UNTESTED* patch, perhaps?

That will kinda work, except you do that mask &= MAY_RWX before
check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits.

> I might have gotten something wrong, so this would need
> double-checking, but if it's right, I find it a _lot_ more easy to
> understand than making one expression that is pretty complicated and
> opaque.

Well, I thought this was readable enough with the added comment. There's
already that magic constant 3 in the shifts, so the 7 seemed entirely
sensible, though one could spell it 0007. Whatever.

Perhaps this? As a whole function, I think that's a bit easier for
brain-storming. It's your patch, just with that rwx thing used instead
of mask, except for the call to check_acl().

static int acl_permission_check(struct inode *inode, int mask)
{
unsigned int mode = inode->i_mode;
unsigned int rwx = mask & (MAY_READ | MAY_WRITE | MAY_EXEC);

/* Are we the owner? If so, ACL's don't matter */
if (likely(uid_eq(current_fsuid(), inode->i_uid))) {
if ((rwx << 6) & ~mode)
return -EACCES;
return 0;
}

/* Do we have ACL's? */
if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
int error = check_acl(inode, mask);
if (error != -EAGAIN)
return error;
}

/*
 * Are the group permissions different from
 * the other permissions in the bits we care
 * about? Need to check group ownership if so.
 */
if (rwx & (mode ^ (mode >> 3))) {
if (in_group_p(inode->i_gid))
mode >>= 3;
}

/* Bits in 'mode' clear that we require? */
return (rwx & ~mode) ? -EACCES : 0;
}

Rasmus


[PATCH resend] fs/namei.c: micro-optimize acl_permission_check

2020-06-05 Thread Rasmus Villemoes
Just something like open(/usr/include/sys/stat.h) causes five calls of
generic_permission -> acl_permission_check -> in_group_p; if the
compiler first tried /usr/local/include/..., that would be a few
more.

Altogether, on a bog-standard Ubuntu 20.04 install, a workload
consisting of compiling lots of userspace programs (i.e., calling lots
of short-lived programs that all need to get their shared libs mapped
in, and the compilers poking around looking for system headers - lots
of /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around
0.1% according to perf top. With an artificial load such as

  while true ; do find /usr/ -print0 | xargs -0 stat > /dev/null ; done

that jumps to over 0.4%.

System-installed files are almost always 0755 (directories and
binaries) or 0644, so in most cases, we can avoid the binary search
and the cost of pulling the cred->groups array and in_group_p() .text
into the cpu cache.

Signed-off-by: Rasmus Villemoes 
---
 fs/namei.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index d81f73ff1a8b..c6f0c6643db5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -303,7 +303,12 @@ static int acl_permission_check(struct inode *inode, int 
mask)
return error;
}
 
-   if (in_group_p(inode->i_gid))
+   /*
+* If the "group" and "other" permissions are the same,
+* there's no point calling in_group_p() to decide which
+* set to use.
+*/
+   if mode >> 3) ^ mode) & 7) && in_group_p(inode->i_gid))
mode >>= 3;
}
 
-- 
2.23.0



Re: [RFC][PATCH 0/4] x86/entry: disallow #DB more

2020-05-25 Thread Rasmus Villemoes
On 23/05/2020 23.32, Peter Zijlstra wrote:
> On Sat, May 23, 2020 at 02:59:40PM +0200, Peter Zijlstra wrote:
>> On Fri, May 22, 2020 at 03:13:57PM -0700, Andy Lutomirski wrote:
> 
>> Good point, so the trivial optimization is below. I couldn't find
>> instruction latency numbers for DRn load/stores anywhere. I'm hoping
>> loads are cheap.
> 
> + u64 empty = 0, read = 0, write = 0;
> + unsigned long dr7;
> +
> + for (i=0; i<100; i++) {
> + u64 s;
> +
> + s = rdtsc();
> + barrier_nospec();
> + barrier_nospec();
> + empty += rdtsc() - s;
> +
> + s = rdtsc();
> + barrier_nospec();
> + dr7 = native_get_debugreg(7);
> + barrier_nospec();
> + read += rdtsc() - s;
> +
> + s = rdtsc();
> + barrier_nospec();
> + native_set_debugreg(7, 0);
> + barrier_nospec();
> + write += rdtsc() - s;
> + }
> +
> + printk("XXX: %ld %ld %ld\n", empty, read, write);
> 
> 
> [1.628125] XXX: 2800 2404 19600
> 
> IOW, reading DR7 is basically free, and certainly cheaper than looking
> at cpu_dr7 which would probably be an insta cache miss.
> 

Naive question: did you check disassembly to see whether gcc threw your
native_get_debugreg() away, given that the asm isn't volatile and the
result is not used for anything? Testing here only shows a "mov
%r9,%db7", but the read did seem to get thrown away.

Rasmus


Re: [PATCH] sched/fair: Return true,false in voluntary_active_balance()

2020-05-08 Thread Rasmus Villemoes
On 08/05/2020 10.16, Peter Zijlstra wrote:
> On Thu, May 07, 2020 at 07:06:25PM +0800, Jason Yan wrote:
>> Fix the following coccicheck warning:
>>
>> kernel/sched/fair.c:9375:9-10: WARNING: return of 0/1 in function
>> 'voluntary_active_balance' with return type bool
> 
> That's not a warning, that's a broken cocinelle script, which if these
> stupid patches don't stop, I'm going to delete myself!


I was wondering why I got cc'ed here. Then it dawned on me. What can I
say, I was young.

Ack on nuking it.

Rasmus



Re: [PATCH 5/5] rtc: pcf2127: report battery switch over

2020-05-05 Thread Rasmus Villemoes
On 05/05/2020 22.13, Alexandre Belloni wrote:
> Add support for the RTC_VL_BACKUP_SWITCH flag to report battery switch over
> events.
> 
> Signed-off-by: Alexandre Belloni 
> ---
>  drivers/rtc/rtc-pcf2127.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c
> index 039078029bd4..967de68e1b03 100644
> --- a/drivers/rtc/rtc-pcf2127.c
> +++ b/drivers/rtc/rtc-pcf2127.c
> @@ -188,18 +188,27 @@ static int pcf2127_rtc_ioctl(struct device *dev,
>   unsigned int cmd, unsigned long arg)
>  {
>   struct pcf2127 *pcf2127 = dev_get_drvdata(dev);
> - int touser;
> + int val, touser = 0;
>   int ret;
>  
>   switch (cmd) {
>   case RTC_VL_READ:
> - ret = regmap_read(pcf2127->regmap, PCF2127_REG_CTRL3, &touser);
> + ret = regmap_read(pcf2127->regmap, PCF2127_REG_CTRL3, &val);
>   if (ret)
>   return ret;
>  
> - touser = touser & PCF2127_BIT_CTRL3_BLF ? RTC_VL_BACKUP_LOW : 0;
> + if (val & PCF2127_BIT_CTRL3_BLF)
> + touser = RTC_VL_BACKUP_LOW;
> +
> + if (val & PCF2127_BIT_CTRL3_BF)
> + touser |= RTC_VL_BACKUP_SWITCH;

I think it's a bit easier to read if you use |= in both cases.

Re patch 3, one saves a little .text by eliding the ioctl function when,
as you say, it cannot be called anyway. No strong opinion either way, I
don't think anybody actually builds without CONFIG_RTC_INTF_DEV, but
those that do are probably the ones that care about having a tiny vmlinux.

Other than that, the series looks good to me.

Thanks,
Rasmus


Re: battery switch-over detection on pcf2127

2020-05-05 Thread Rasmus Villemoes
On 05/05/2020 22.38, Bruno Thomsen wrote:
> Hi Rasmus
> 
> Den tir. 5. maj 2020 kl. 22.07 skrev Alexandre Belloni
> :
>>
>> On 05/05/2020 21:54:47+0200, Rasmus Villemoes wrote:
>>> Hi Bruno
>>>
>>> I just noticed your "rtc: pcf2127: add tamper detection support"
>>> (03623b4b04) from 5.4. Unfortunately, clearing the BTSE bit breaks a use
>>> case of ours:
>>>
>>> We rely on the battery switch-over detection to distinguish a powerfail
>>> during boot from a PORESET by the external watchdog (in the latter case,
>>> the RTC is still powered throughout, meaning there is no battery
>>> switch-over event). OTOH, we do not use the tamper detection - in fact,
>>> the TS signal is unconnected on our board.
>>>
>>> We're currently still on 4.19, but we will eventually upgrade to a
>>> kernel containing the above commit. So I was wondering if we could
>>> figure out a way that would work for both of us - either some CONFIG
>>> knob, or perhaps something in the device-tree. Any ideas?
>>>
>>
>> Yes, I was working on a patch series last week allowing to read BF. I'm
>> not sure clearing BTSE is your issue but clearing BF is.
>>
>> I'm going to send it tonight, I'll copy you, let me now if that works
>> for you. You can then read BF using the RTC_VL_READ ioctl. The
>> RTC_VL_BACKUP_SWITCH flag will be set if a switchover happened.
>> The RTC_VL_CLR ioctl can be used to clear the flag.
>>
>> I think clearing BTSE is still the right thing to do.
> 
> I think your use case is valid and it sounds like Alexandre solution will
> solve it as you just need to know if a battery switch-over has happened
> not when exactly it happened.
> 
> I can help test the patches too.

Thanks for the quick replies, both. Unfortunately, being able to read BF
from linux is not relevant to us - all the handling happens early in the
bootloader (including clearing BF, so that we can detect that the
previous boot failed only because of power fail - hence whether the
linux driver clears BF or not is not relevant). We really just want
linux to not touch the bits in CTRL3 at all.

Hm, wait. Re-reading the above suggests that BF can get set even if BTSE
is not, and a quick experiment shows that is true - I must have misread
the data sheet. While I think that's fine for now (currently I only
print the time of last switch-over as a diagnostic), I did have some use
case in mind for comparing that timestamp to the current time and make
decisions based on that. But until I figure out exactly what I want to
use it for, and until we actually upgrade to 5.4+, there's no rush.

Thanks,
Rasmus


battery switch-over detection on pcf2127

2020-05-05 Thread Rasmus Villemoes
Hi Bruno

I just noticed your "rtc: pcf2127: add tamper detection support"
(03623b4b04) from 5.4. Unfortunately, clearing the BTSE bit breaks a use
case of ours:

We rely on the battery switch-over detection to distinguish a powerfail
during boot from a PORESET by the external watchdog (in the latter case,
the RTC is still powered throughout, meaning there is no battery
switch-over event). OTOH, we do not use the tamper detection - in fact,
the TS signal is unconnected on our board.

We're currently still on 4.19, but we will eventually upgrade to a
kernel containing the above commit. So I was wondering if we could
figure out a way that would work for both of us - either some CONFIG
knob, or perhaps something in the device-tree. Any ideas?

Rasmus


Re: [PATCH v4 14/18] static_call: Add static_cond_call()

2020-05-05 Thread Rasmus Villemoes
On 04/05/2020 22.14, Peter Zijlstra wrote:
> On Mon, May 04, 2020 at 09:20:03AM +0200, Rasmus Villemoes wrote:
> 
>>
>> Indeed, that is horrible. And it "fixes" the argument evaluation by
>> changing the !HAVE_STATIC_CALL case to match the HAVE_STATIC_CALL, not
>> the other way around,
> 
> Correct; making it the other way is far more 'interesting'. It would
> basically mean combining the static_branch and static_call, but that
> would also make it less optimal for simple forwarding cases.

Yes, I can see how implementing that would be quite hard.

>> which means that it is not a direct equivalent to the
>>
>>   if (foo)
>>  foo(a, b, c)
>>
>> [which pattern of course has the READ_ONCE issue, but each individual
>> existing site with that may be ok for various reasons].
>>
>> Is gcc smart enough to change the if (!func) to a jump across the
>> function call (but still evaluting side effects in args), or is
>> __static_call_nop actually emitted and called?
> 
> I was hoping it would be clever, but I just tried (find below) and it is
> not -- although there's always hoping a newer version / clang might be
> smarter.
> 
> It does indeed emit the nop function :/

Hm.

>> If the latter, then one
>> might as well patch the write-side to do "WRITE_ONCE(foo, func ? :
>> __static_call_nop)" and elide the test from __static_cond_call() - in
>> fact, that just becomes a single READ_ONCE. [There's probably some
>> annoying issue with making sure static initialization of foo points at
>> __static_call_nop].
> 
> But that would not give a more clever compiler the ability to do the
> 'right' thing here..

True. However, I think it's unlikely we'll see a compiler being that
clever anytime soon.

Anyway, it's hard to judge what version of the !HAVE_STATIC_CALL
implementation is best when there's no !HAVE_STATIC_CALL use cases to
look at. I just want to ensure that whatever limitations or gotchas
(e.g., "arguments are evaluated regardless of NULLness of func", or
alternatively "arguments must not have side effects") it ends up with
get documented.

> 
> #define __static_cond_call(name) \
> ({ \
>   void *func = READ_ONCE(name.func); \
>   if (!func) \
>   func = &__static_call_nop; \
>   (typeof(__SCT__##name)*)func; \
> })

I think you can just make it

#define __static_cond_call(name) \
( \
(typeof(__SCT__##name)*) ((void *)name.func ? : (void *)__static_call_nop) \
)

but that simplification is not enough to make gcc change its mind about
how to compile it :( But I'm guessing that various sanitizers or static
checkers might complain about the UB.

Rasmus


Re: [PATCH v4 14/18] static_call: Add static_cond_call()

2020-05-04 Thread Rasmus Villemoes
On 03/05/2020 14.58, Peter Zijlstra wrote:
> On Sat, May 02, 2020 at 03:08:00PM +0200, Rasmus Villemoes wrote:
>> On 01/05/2020 22.29, Peter Zijlstra wrote:

>>> +#define static_cond_call(name) 
>>> \
>>> +   if (STATIC_CALL_KEY(name).func) \
>>> +   ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func))
>>> +
>>
>> This addresses neither the READ_ONCE issue nor the fact that,
>> AFAICT, the semantics of
>>
>>   static_cond_call(foo)(i++)
>>
>> will depend on CONFIG_HAVE_STATIC_CALL.
> 
> True.
> 
> So there is something utterly terrible we can do to address both:
> 
> void __static_call_nop(void)
> {
> }
> 
> #define __static_cond_call(name) \
> ({ \
>   void *func = READ_ONCE(STATIC_CALL_KEY(name).func); \
>   if (!func) \
>   func = &__static_call_nop; \
>   (typeof(STATIC_CALL_TRAMP(name))*)func; \
> })
> 
> #define static_cond_call(name) (void)__static_cond_call(name)
> 
> This gets us into Undefined Behaviour territory, but it ought to work.
> 
> It adds the READ_ONCE(), and it cures the argument evaluation issue.

Indeed, that is horrible. And it "fixes" the argument evaluation by
changing the !HAVE_STATIC_CALL case to match the HAVE_STATIC_CALL, not
the other way around, which means that it is not a direct equivalent to the

  if (foo)
 foo(a, b, c)

[which pattern of course has the READ_ONCE issue, but each individual
existing site with that may be ok for various reasons].

Is gcc smart enough to change the if (!func) to a jump across the
function call (but still evaluting side effects in args), or is
__static_call_nop actually emitted and called? If the latter, then one
might as well patch the write-side to do "WRITE_ONCE(foo, func ? :
__static_call_nop)" and elide the test from __static_cond_call() - in
fact, that just becomes a single READ_ONCE. [There's probably some
annoying issue with making sure static initialization of foo points at
__static_call_nop].

And that brings me to the other issue I raised - do you have a few
examples of call sites that could use this, so we can see disassembly
before/after? I'm still concerned that, even if there are no
side-effects in the arguments, you still force the compiler to
spill/shuffle registers for call/restore unconditionally, whereas with a
good'ol if(), all that work is guarded by the load+test.

Rasmus


Re: [PATCH v4 14/18] static_call: Add static_cond_call()

2020-05-02 Thread Rasmus Villemoes
On 01/05/2020 22.29, Peter Zijlstra wrote:
> Extend the static_call infrastructure to optimize the following common
> pattern:
> 
>   if (func_ptr)
>   func_ptr(args...)
> 
> +
>  #define static_call(name)__static_call(name)
> +#define static_cond_call(name)   (void)__static_call(name)
>  
> +
>  #define static_call(name)__static_call(name)
> +#define static_cond_call(name)   (void)__static_call(name)
>  

> +#define static_cond_call(name)   
> \
> + if (STATIC_CALL_KEY(name).func) \
> + ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func))
> +

This addresses neither the READ_ONCE issue nor the fact that, AFAICT,
the semantics of

  static_cond_call(foo)(i++)

will depend on CONFIG_HAVE_STATIC_CALL. Also, I'd have appreciated being
cc'ed on new revisions instead of stumbling on it by chance.

Rasmus


Re: [PATCH RT 0/2] Linux v4.19.115-rt50-rc1

2020-04-30 Thread Rasmus Villemoes
On 28/04/2020 21.52, zanu...@kernel.org wrote:
> From: Tom Zanussi 
> 
> Dear RT Folks,
> 
> This is the RT stable review cycle of patch 4.19.115-rt50-rc1.
> 
> Please scream at me if I messed something up. Please test the patches
> too.

For good measure, the customer reports that 4.19.115-rt50-rc1 indeed
seems to fix the boot hang they saw.

Thanks,
Rasmus


Re: [RFC PATCH bpf-next 0/6] bpf, printk: add BTF-based type printing

2020-04-29 Thread Rasmus Villemoes
On 20/04/2020 18.32, Joe Perches wrote:
> On Mon, 2020-04-20 at 16:29 +0100, Alan Maguire wrote:

   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);

   pr_info("%pTN", skb);
>>>
>>> why follow "TN" convention?
>>> I think "%p" is much more obvious, unambiguous, and
>>> equally easy to parse.
>>>
>>
>> That was my first choice, but the first character
>> after the 'p' in the '%p' specifier signifies the
>> pointer format specifier. If we use '<', and have
>> '%p<', where do we put the modifiers? '%p'
>> seems clunky to me.

There's also the issue that %p followed by alnum has been understood to
be reserved/specially handled by the kernel's printf implementation for
a decade, while other characters have so far been treated as "OK, this
was just a normal %p". A quick grep for %p< only gives a hit in
drivers/scsi/dc395x.c, but there could be others (with field width
modifier between % and p), and in any case I think it's a bad idea to
extend the set of characters that cannot follow %p.

Rasmus


Re: [RFC PATCH bpf-next 5/6] printk: add type-printing %pT format specifier which uses BTF

2020-04-29 Thread Rasmus Villemoes
On 17/04/2020 12.42, Alan Maguire wrote:
> printk supports multiple pointer object type specifiers (printing
> netdev features etc).  Extend this support using BTF to cover
> arbitrary types.  "%pT" specifies the typed format, and a suffix
> enclosed  specifies the type, for example, specifying
> 
>   printk("%pT", skb)
> 
> ...will utilize BTF information to traverse the struct sk_buff *
> and display it.  Support is present for structs, unions, enums,
> typedefs and core types (though in the latter case there's not
> much value in using this feature of course).
> 
> Default output is compact, specifying values only, but the
> 'N' modifier can be used to show field names to more easily
> track values.  Pointer values are obfuscated as usual.  As
> an example:
> 
>   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
>   pr_info("%pTN", skb);
> 
> ...gives us:
> 
> {{{.next=c7916e9c,.prev=c7916e9c,{.dev=c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,.rb_right=c7916e9c,.rb_left=c7916e9c}|.list={.next=c7916e9c,.prev=c7916e9c}},{.sk=c7916e9c|.ip_defrag_offset=0},{.tstamp=0|.skb_mstamp_ns=0},.cb=['\0'],{{._skb_refdst=0,.destructor=c7916e9c}|.tcp_tsorted_anchor={.next=c7916e9c,.prev=c7916e9c}},._nfct=0,.len=0,.data_len=0,.mac_len=0,.hdr_len=0,.queue_mapping=0,.__cloned_offset=[],.cloned=0x0,.nohdr=0x0,.fclone=0x0,.peeked=0x0,.head_frag=0x0,.pfmemalloc=0x0,.active_extensions=0,.headers_start=[],.__pkt_type_offset=[],.pkt_type=0x0,.ignore_df=0x0,.nf_trace=0x0,.ip_summed=0x0,.ooo_okay=0x0,.l4_hash=0x0,.sw_hash=0x0,.wifi_acked_valid=0x0,.wifi_acked=0x0,.no_fcs=0x0,.encapsulation=0x0,.encap_hdr_csum=0x0,.csum_valid=0x0,.__pkt_vlan_present_offset=[],.vlan_present=0x0,.csum_complete_sw=0x0,.csum_level=0x0,.csum_not_inet=0x0,.dst_pending_co
> 
> printk output is truncated at 1024 bytes.  For such cases, the compact
> display mode (minus the field info) may be used. "|" differentiates
> between different union members.
> 
> Signed-off-by: Alan Maguire 
> ---
>  Documentation/core-api/printk-formats.rst |   8 ++
>  include/linux/btf.h   |   3 +-
>  lib/Kconfig   |  16 
>  lib/vsprintf.c| 145 
> +-
>  4 files changed, 169 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/core-api/printk-formats.rst 
> b/Documentation/core-api/printk-formats.rst
> index 8ebe46b1..b786577 100644
> --- a/Documentation/core-api/printk-formats.rst
> +++ b/Documentation/core-api/printk-formats.rst
> @@ -545,6 +545,14 @@ For printing netdev_features_t.
>  
>  Passed by reference.
>  
> +BTF-based printing of pointer data
> +--
> +If '%pT[N]' is specified, use the BPF Type Format (BTF) to
> +show the typed data.  For example, specifying '%pT' will 
> utilize
> +BTF information to traverse the struct sk_buff * and display it.
> +
> +Supported modifer is 'N' (show type field names).
> +
>  Thanks
>  ==
>  
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 2f78dc8..456bd8f 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -158,10 +158,11 @@ static inline const struct btf_member 
> *btf_type_member(const struct btf_type *t)
>   return (const struct btf_member *)(t + 1);
>  }
>  
> +struct btf *btf_parse_vmlinux(void);
> +
>  #ifdef CONFIG_BPF_SYSCALL
>  const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
>  const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> -struct btf *btf_parse_vmlinux(void);
>  struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
>  #else
>  static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
> diff --git a/lib/Kconfig b/lib/Kconfig
> index bc7e563..e92109e 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -6,6 +6,22 @@
>  config BINARY_PRINTF
>   def_bool n
>  
> +config BTF_PRINTF

I don't see any IS_ENABLED(BTF_PRINTF) anywhere in this patch? Shouldn't
the vsprintf.c handler be guarded by that?

> +#define is_btf_fmt_start(c)  (c == 'T')
> +#define is_btf_type_start(c) (c == '<')
> +#define is_btf_type_end(c)   (c == '>')
> +
> +#define btf_modifier_flag(c) (c == 'N' ? BTF_SHOW_NAME : 0)
> +
> +static noinline_for_stack
> +const char *skip_btf_type(const char *fmt, bool *found_btf_type)
> +{
> + *found_btf_type = false;
> +
> + if (!is_btf_fmt_start(*fmt))
> + return fmt;
> + fmt++;
> +
> + while (btf_modifier_flag(*fmt))
> + fmt++;
> +
> + if (!is_btf_type_start(*fmt))
> + return fmt;
> +
> + while (!is_btf_type_end(*fmt) && *fmt != '\0')
> + fmt++;
> +
> + if (is_btf_type_end(*fmt)) {
> + fmt++;
> + *found_btf_type = true;
> + }
> +
> + return fmt;
> +}
> +
> +static noinline_for_stack
> +char *btf_string(char *buf, char *end, void *ptr, struct printf_

[PATCH -rt] hrtimer: fix logic for when grabbing softirq_expiry_lock can be elided

2020-04-28 Thread Rasmus Villemoes
Commit

  hrtimer: Add a missing bracket and hide `migration_base' on !SMP

which is 47b6de0b7f22 in 5.2-rt and 40aae5708e7a in 4.19-rt,
inadvertently changed the logic from base != &migration_base to base
== &migration_base.

On !CONFIG_SMP, the effect was to effectively always elide this
lock/unlock pair (since is_migration_base() is unconditionally false),
which for me consistently causes lockups during reboot, and reportedly
also often causes a hang during boot.

Adding this logical negation (or, what is effectively the same thing
on !CONFIG_SMP, reverting the above commit as well as "hrtimer:
Prevent using hrtimer_grab_expiry_lock() on migration_base") fixes
that lockup.

Fixes: 40aae5708e7a (hrtimer: Add a missing bracket and hide `migration_base' 
on !SMP) # 4.19-rt
Fixes: 47b6de0b7f22 (hrtimer: Add a missing bracket and hide `migration_base' 
on !SMP) # 5.2-rt
Signed-off-by: Rasmus Villemoes 
---
Something like this? I wasn't sure what Fixes: tag(s) to include, if
any. It's quite possible the same fix is needed on earlier -rt
kernels, I didn't check.

 kernel/time/hrtimer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index e54a95de8b79..c3966c090246 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -953,7 +953,7 @@ void hrtimer_grab_expiry_lock(const struct hrtimer *timer)
 {
struct hrtimer_clock_base *base = READ_ONCE(timer->base);
 
-   if (timer->is_soft && is_migration_base(base)) {
+   if (timer->is_soft && !is_migration_base(base)) {
spin_lock(&base->cpu_base->softirq_expiry_lock);
spin_unlock(&base->cpu_base->softirq_expiry_lock);
}
-- 
2.23.0



Re: [PATCH RT 10/30] hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base

2020-04-28 Thread Rasmus Villemoes
On 28/04/2020 14.59, Tom Zanussi wrote:
> On Tue, 2020-04-28 at 09:03 +0200, Rasmus Villemoes wrote:

>> Hold on a second. This patch (hrtimer: Prevent using
>> hrtimer_grab_expiry_lock() on migration_base) indeed seems to
>> implement
>> the optimization implied by the above, namely avoid the lock/unlock
>> in
>> case base == migration_base:
>>
>>> -   if (timer->is_soft && base && base->cpu_base) {
>>> +   if (timer->is_soft && base != &migration_base) {
>>
>> But the followup patch (hrtimer: Add a missing bracket and hide
>> `migration_base on !SMP) to fix the build on !SMP [the missing
>> bracket
>> part seems to have been fixed when backporting the above to 4.19-rt]
>> replaces that logic by
>>
>> +static inline bool is_migration_base(struct hrtimer_clock_base
>> *base)
>> +{
>> +return base == &migration_base;
>> +}
>> +
>> ...
>> -if (timer->is_soft && base != &migration_base) {
>> +if (timer->is_soft && is_migration_base(base)) {
>>
>> in the SMP case, i.e. the exact opposite condition. One of these
>> can't
>> be correct.
>>
>> Assuming the followup patch was wrong and the condition should have
>> read
>>
>>   timer->is_soft && !is_migration_base(base)
>>
>> while keeping is_migration_base() false on !SMP might explain the
>> problem I see. But I'd like someone who knows this code to chime in.
>>
> 
> I don't know this code, but I think you're correct - the followup patch
> reversed the condition by forgetting the !.
> 
> So, does your problem go away when you make that change?

Yes, it does. (I'll have to ask the customer to check in their setup
whether the boot hang also vanishes).

Essentially, adding that ! is equivalent to reverting the two patches on
!SMP (which I also tested): Before, the condition was

  timer->is_soft && base && base->cpu_base

and, assuming the NULL pointer checks are indeed redundant, that's the
same as "timer->is_soft". Appending " && !is_migration_base()" to that,
with is_migration_base() always false as on !SMP, doesn't change anything.

Thanks,
Rasmus


Re: [PATCH RT 10/30] hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base

2020-04-28 Thread Rasmus Villemoes
On 23/01/2020 21.39, Steven Rostedt wrote:
> 4.19.94-rt39-rc2 stable review patch.
> If anyone has any objections, please let me know.
>
> --
> 
> From: Julien Grall 
> 
> [ Upstream commit cef1b87f98823af923a386f3f69149acb212d4a1 ]
> 
> As tglx puts it:
> |If base == migration_base then there is no point to lock soft_expiry_lock
> |simply because the timer is not executing the callback in soft irq context
> |and the whole lock/unlock dance can be avoided.

Hold on a second. This patch (hrtimer: Prevent using
hrtimer_grab_expiry_lock() on migration_base) indeed seems to implement
the optimization implied by the above, namely avoid the lock/unlock in
case base == migration_base:

> - if (timer->is_soft && base && base->cpu_base) {
> + if (timer->is_soft && base != &migration_base) {

But the followup patch (hrtimer: Add a missing bracket and hide
`migration_base on !SMP) to fix the build on !SMP [the missing bracket
part seems to have been fixed when backporting the above to 4.19-rt]
replaces that logic by

+static inline bool is_migration_base(struct hrtimer_clock_base *base)
+{
+   return base == &migration_base;
+}
+
...
-   if (timer->is_soft && base != &migration_base) {
+   if (timer->is_soft && is_migration_base(base)) {

in the SMP case, i.e. the exact opposite condition. One of these can't
be correct.

Assuming the followup patch was wrong and the condition should have read

  timer->is_soft && !is_migration_base(base)

while keeping is_migration_base() false on !SMP might explain the
problem I see. But I'd like someone who knows this code to chime in.

Thanks,
Rasmus


Re: [PATCH RT 10/30] hrtimer: Prevent using hrtimer_grab_expiry_lock() on migration_base

2020-04-27 Thread Rasmus Villemoes
On 27/04/2020 21.26, Tom Zanussi wrote:
> On Mon, 2020-04-27 at 15:06 -0400, Steven Rostedt wrote:
>> On Mon, 27 Apr 2020 15:10:00 +0200
>> Rasmus Villemoes  wrote:
>>
>>> Reverting
>>>
>>> b1a471ec4df1 - hrtimer: Prevent using hrtimer_grab_expiry_lock() on
>>> migration_base
>>> 40aae5708e7a - hrtimer: Add a missing bracket and hide
>>> `migration_base'
>>> on !SMP
>>>
>>> on top of v4.19.94-rt39 makes that problem go away, i.e. the board
>>> reboots as expected.
>>>
>> Thanks Rasmus for looking into this. Tom now maintains 4.19-rt.
>>
>> Tom, care to pull in these patches on top of 4.19-rt?
>>
> 
> Those patches are already in 4.19-rt - he's saying that reverting them
> fixes the problem.
> 
> I'm guessing that the assumption of base or base->cpu_base always being
> non-NULL in those patches might be wrong.  If so, the below patch
> should fix the problem:
> 
> Subject: [PATCH] hrtimer: Add back base and base->cpu_base checks in
>  hrtimer_grab_expiry_lock()
> 
> 4.19 commit b1a471ec4df1 [hrtimer: Prevent using
> hrtimer_grab_expiry_lock() on migration_base] removed the NULL checks
> for timer->base and timer->base->cpu_base on the assumption that
> they're always non-NULL.  That assumption is apparently not to be
> true, so add the checks back.
> 
> Signed-off-by: Tom Zanussi 
> ---
>  kernel/time/hrtimer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index e54a95de8b79..6f20cf23008b 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -953,7 +953,7 @@ void hrtimer_grab_expiry_lock(const struct hrtimer *timer)
>  {
>   struct hrtimer_clock_base *base = READ_ONCE(timer->base);
>  
> - if (timer->is_soft && is_migration_base(base)) {
> + if (timer->is_soft && base && base->cpu_base && 
> is_migration_base(base)) {

I'm sorry, but no, I don't think that can be it. For !SMP (my case),
is_migration_base() is always false, so with or without the above, the
whole if() is false. Also, the symptoms do not look like a NULL pointer
deref, but more like a dead (or live) lock - so I'm guessing (and that's
just a wild guess) that the lock/unlock sequence is needed to give some
other thread a priority boost to make the whole machine make forward
progress.

Rasmus


Re: [PATCH v4] string-choice: add yesno(), onoff(), enableddisabled(), plural() helpers

2019-10-23 Thread Rasmus Villemoes
On 23/10/2019 15.13, Jani Nikula wrote:
> The kernel has plenty of ternary operators to choose between constant
> strings, such as condition ? "yes" : "no", as well as value == 1 ? "" :
> "s":
> 
> 
> v4: Massaged commit message about space savings to make it less fluffy
> based on Rasmus' feedback.

Thanks, it looks good to me. FWIW,

Acked-by: Rasmus Villemoes 



[PATCH] powerpc/85xx: remove mostly pointless mpc85xx_qe_init()

2019-10-23 Thread Rasmus Villemoes
Since commit 302c059f2e7b (QE: use subsys_initcall to init qe),
mpc85xx_qe_init() has done nothing apart from possibly emitting a
pr_err(). As part of reducing the amount of QE-related code in
arch/powerpc/ (and eventually support QE on other architectures),
remove this low-hanging fruit.

Signed-off-by: Rasmus Villemoes 
---
 arch/powerpc/platforms/85xx/common.c  | 23 ---
 arch/powerpc/platforms/85xx/corenet_generic.c |  2 --
 arch/powerpc/platforms/85xx/mpc85xx.h |  2 --
 arch/powerpc/platforms/85xx/mpc85xx_mds.c |  1 -
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c |  1 -
 arch/powerpc/platforms/85xx/twr_p102x.c   |  1 -
 6 files changed, 30 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/common.c 
b/arch/powerpc/platforms/85xx/common.c
index fe0606439b5a..a554b6d87cf7 100644
--- a/arch/powerpc/platforms/85xx/common.c
+++ b/arch/powerpc/platforms/85xx/common.c
@@ -86,29 +86,6 @@ void __init mpc85xx_cpm2_pic_init(void)
 #endif
 
 #ifdef CONFIG_QUICC_ENGINE
-void __init mpc85xx_qe_init(void)
-{
-   struct device_node *np;
-
-   np = of_find_compatible_node(NULL, NULL, "fsl,qe");
-   if (!np) {
-   np = of_find_node_by_name(NULL, "qe");
-   if (!np) {
-   pr_err("%s: Could not find Quicc Engine node\n",
-   __func__);
-   return;
-   }
-   }
-
-   if (!of_device_is_available(np)) {
-   of_node_put(np);
-   return;
-   }
-
-   of_node_put(np);
-
-}
-
 void __init mpc85xx_qe_par_io_init(void)
 {
struct device_node *np;
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index 7ee2c6628f64..a328a741b457 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -66,8 +66,6 @@ void __init corenet_gen_setup_arch(void)
swiotlb_detect_4g();
 
pr_info("%s board\n", ppc_md.name);
-
-   mpc85xx_qe_init();
 }
 
 static const struct of_device_id of_device_ids[] = {
diff --git a/arch/powerpc/platforms/85xx/mpc85xx.h 
b/arch/powerpc/platforms/85xx/mpc85xx.h
index fa23f9b0592c..cb84c5c56c36 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx.h
+++ b/arch/powerpc/platforms/85xx/mpc85xx.h
@@ -10,10 +10,8 @@ static inline void __init mpc85xx_cpm2_pic_init(void) {}
 #endif /* CONFIG_CPM2 */
 
 #ifdef CONFIG_QUICC_ENGINE
-extern void mpc85xx_qe_init(void);
 extern void mpc85xx_qe_par_io_init(void);
 #else
-static inline void __init mpc85xx_qe_init(void) {}
 static inline void __init mpc85xx_qe_par_io_init(void) {}
 #endif
 
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_mds.c 
b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
index 5ca254256c47..120633f99ea6 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_mds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
@@ -238,7 +238,6 @@ static void __init mpc85xx_mds_qe_init(void)
 {
struct device_node *np;
 
-   mpc85xx_qe_init();
mpc85xx_qe_par_io_init();
mpc85xx_mds_reset_ucc_phys();
 
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c 
b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
index d3c540ee558f..7f9a84f85766 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
@@ -89,7 +89,6 @@ static void __init mpc85xx_rdb_setup_arch(void)
fsl_pci_assign_primary();
 
 #ifdef CONFIG_QUICC_ENGINE
-   mpc85xx_qe_init();
mpc85xx_qe_par_io_init();
 #if defined(CONFIG_UCC_GETH) || defined(CONFIG_SERIAL_QE)
if (machine_is(p1025_rdb)) {
diff --git a/arch/powerpc/platforms/85xx/twr_p102x.c 
b/arch/powerpc/platforms/85xx/twr_p102x.c
index 720b0c0f03ba..6c3c0cdaee9a 100644
--- a/arch/powerpc/platforms/85xx/twr_p102x.c
+++ b/arch/powerpc/platforms/85xx/twr_p102x.c
@@ -72,7 +72,6 @@ static void __init twr_p1025_setup_arch(void)
fsl_pci_assign_primary();
 
 #ifdef CONFIG_QUICC_ENGINE
-   mpc85xx_qe_init();
mpc85xx_qe_par_io_init();
 
 #if IS_ENABLED(CONFIG_UCC_GETH) || IS_ENABLED(CONFIG_SERIAL_QE)
-- 
2.23.0



Re: [PATCH 3/7] soc: fsl: qe: avoid ppc-specific io accessors

2019-10-23 Thread Rasmus Villemoes
On 22/10/2019 17.01, Christophe Leroy wrote:
> 
> 
> On 10/18/2019 12:52 PM, Rasmus Villemoes wrote:
>> In preparation for allowing to build QE support for architectures
>> other than PPC, replace the ppc-specific io accessors. Done via
>>
> 
> This patch is not transparent in terms of performance, functions get
> changed significantly.
> 
> Before the patch:
> 
> 0330 :
>  330:    81 43 00 04 lwz r10,4(r3)
>  334:    7c 00 04 ac hwsync
>  338:    81 2a 00 00 lwz r9,0(r10)
>  33c:    0c 09 00 00 twi 0,r9,0
>  340:    4c 00 01 2c isync
>  344:    70 88 00 02 andi.   r8,r4,2
>  348:    41 82 00 10 beq 358 
>  34c:    39 00 00 01 li  r8,1
>  350:    91 03 00 10 stw r8,16(r3)
>  354:    61 29 00 10 ori r9,r9,16
>  358:    70 88 00 01 andi.   r8,r4,1
>  35c:    41 82 00 10 beq 36c 
>  360:    39 00 00 01 li  r8,1
>  364:    91 03 00 14 stw r8,20(r3)
>  368:    61 29 00 20 ori r9,r9,32
>  36c:    7c 00 04 ac hwsync
>  370:    91 2a 00 00 stw r9,0(r10)
>  374:    4e 80 00 20 blr
> 
> After the patch:
> 
> 030c :
>  30c:    94 21 ff e0 stwu    r1,-32(r1)
>  310:    7c 08 02 a6 mflr    r0
>  314:    bf a1 00 14 stmw    r29,20(r1)
>  318:    7c 9f 23 78 mr  r31,r4
>  31c:    90 01 00 24 stw r0,36(r1)
>  320:    7c 7e 1b 78 mr  r30,r3
>  324:    83 a3 00 04 lwz r29,4(r3)
>  328:    7f a3 eb 78 mr  r3,r29
>  32c:    48 00 00 01 bl  32c 
>     32c: R_PPC_REL24    ioread32be
>  330:    73 e9 00 02 andi.   r9,r31,2
>  334:    41 82 00 10 beq 344 
>  338:    39 20 00 01 li  r9,1
>  33c:    91 3e 00 10 stw r9,16(r30)
>  340:    60 63 00 10 ori r3,r3,16
>  344:    73 e9 00 01 andi.   r9,r31,1
>  348:    41 82 00 10 beq 358 
>  34c:    39 20 00 01 li  r9,1
>  350:    91 3e 00 14 stw r9,20(r30)
>  354:    60 63 00 20 ori r3,r3,32
>  358:    80 01 00 24 lwz r0,36(r1)
>  35c:    7f a4 eb 78 mr  r4,r29
>  360:    bb a1 00 14 lmw r29,20(r1)
>  364:    7c 08 03 a6 mtlr    r0
>  368:    38 21 00 20 addi    r1,r1,32
>  36c:    48 00 00 00 b   36c 
>     36c: R_PPC_REL24    iowrite32be

True. Do you know why powerpc uses out-of-line versions of these
accessors when !PPC_INDIRECT_PIO, i.e. at least all of PPC32? It's quite
a bit beyond the scope of this series, but I'd expect moving most if not
all of arch/powerpc/kernel/iomap.c into asm/io.h (guarded by
!defined(CONFIG_PPC_INDIRECT_PIO) of course) as static inlines would
benefit all ppc32 users of iowrite32 and friends.

Is there some other primitive available that (a) is defined on all
architectures (or at least both ppc and arm) and (b) expands to good
code in both/all cases?

Note that a few uses of the the iowrite32be accessors has already
appeared in the qe code with the introduction of the qe_clrsetbits()
helpers in bb8b2062af.

Rasmus


Re: [PATCH 0/7] towards QE support on ARM

2019-10-22 Thread Rasmus Villemoes
On 22/10/2019 04.24, Qiang Zhao wrote:
> On Mon, Oct 22, 2019 at 6:11 AM Leo Li wrote

>> Right.  I'm really interested in getting this applied to my tree and make it
>> upstream.  Zhao Qiang, can you help to review Rasmus's patches and
>> comment?
> 
> As you know, I maintained a similar patchset removing PPC, and someone told 
> me qe_ic should moved into drivers/irqchip/.
> I also thought qe_ic is a interrupt control driver, should be moved into dir 
> irqchip.

Yes, and I also plan to do that at some point. However, that's
orthogonal to making the driver build on ARM, so I don't want to mix the
two. Making it usable on ARM is my/our priority currently.

I'd appreciate your input on my patches.

Rasmus



Re: [PATCH 0/7] towards QE support on ARM

2019-10-22 Thread Rasmus Villemoes
On 22/10/2019 00.11, Li Yang wrote:
> On Mon, Oct 21, 2019 at 3:46 AM Rasmus Villemoes
>  wrote:
>>

>>> Can you try the 4.14 branch from a newer LSDK release?  LS1021a should
>>> be supported platform on LSDK.  If it is broken, something is wrong.
>>
>> What newer release? LSDK-18.06-V4.14 is the latest -V4.14 tag at
>> https://github.com/qoriq-open-source/linux.git, and identical to the
> 
> That tree has been abandoned for a while, we probably should state
> that in the github.  The latest tree can be found at
> https://source.codeaurora.org/external/qoriq/qoriq-components/linux/

Ah. FYI, googling "LSDK" gives https://lsdk.github.io as one of the
first hits, and (apart from itself being a github url) that says on the
front page "Disaggregated components of LSDK are available in github.".
But yes, navigating to the Components tab and from there to lsdk linux
one does get directed at codeaurora.

>> In any case, we have zero interest in running an NXP kernel. Maybe I
>> should clarify what I meant by "based on commits from" above: We're
>> currently running a mainline 4.14 kernel on LS1021A, with a few patches
>> inspired from the NXP 4.1 branch applied on top - but also with some
>> manual fixes for e.g. the pvr_version_is() issue. Now we want to move
>> that to a 4.19-based kernel (so that it aligns with our MPC8309 platform).
> 
> We also provide 4.19 based kernel in the codeaurora repo.  I think it
> will be helpful to reuse patches there if you want to make your own
> tree.

Again, we don't want to run off an NXP kernel, we want to get the
necessary pieces upstream. For now, we have to live with a patched 4.19
kernel, but hopefully by the time we switch to 5.x (for some x >= 5) we
don't need to supply anything other than our own .dts and defconfig.

>> Yes, as I said, I wanted to try a fresh approach since Zhao
>> Qiang's patches seemed to be getting nowhere. Splitting the patches into
>> smaller pieces is definitely part of that - for example, the completely
>> trivial whitespace fix in patch 1 is to make sure the later coccinelle
>> generated patch is precisely that (i.e., a later respin can just rerun
>> the coccinelle script, with zero manual fixups). I also want to avoid
>> mixing the ppcism cleanups with other things (e.g. replacing some
>> of_get_property() by of_property_read_u32()). And the "testing on ARM"
>> part comes once I get to actually building on ARM. But there's not much
>> point doing all that unless there's some indication that this can be
>> applied to some tree that actually feeds into Linus', which is why I
>> started with a few trivial patches and precisely to start this discussion.
> 
> Right.  I'm really interested in getting this applied to my tree and
> make it upstream.  Zhao Qiang, can you help to review Rasmus's patches
> and comment?

Thanks, this is exactly what I was hoping for. Even just getting these
first rather trivial patches (in that they don't attempt to build for
ARM, or change functionality at all for PPC) merged for 5.5 would reduce
the amount of out-of-tree patches that we (and NXP for that matter)
would have to carry. I'll take the above as a go-ahead for me to try to
post more patches working towards enabling some of the QE drivers for ARM.

Rasmus


<    1   2   3   4   5   6   7   8   9   10   >