Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-17 Thread Matthew Wilcox
On Wed, May 08, 2024 at 11:11:35PM -0700, Kees Cook wrote:
>   Or even just simple math between u8s:
> 
>   while (xas->xa_offset == 255) {
>   xas->xa_offset = xas->xa_node->offset - 1;
> 
>   Is it expecting to wrap (truncate)? How is it distinguished from
>   the "num_elems++" case?

Hi ;-)

This looks to me like it's walking up the tree, looking to go to the
'prev' element.  So yes, the wrapping from 0 to 255 is intentional,
because once we find an offset that's not 0, we've set offset to the
right one.  Just to give more context, here's the few lines around it:

if (xas->xa_offset != get_offset(xas->xa_index, xas->xa_node))
xas->xa_offset--;

while (xas->xa_offset == 255) {
xas->xa_offset = xas->xa_node->offset - 1;
xas->xa_node = xa_parent(xas->xa, xas->xa_node);
if (!xas->xa_node)
return set_bounds(xas);
}

So it's kind of nasty.  I don't want to write it as:

while (xas->xa_offset == 0) {
xas->xa_offset = xas->xa_node->offset;
xas->xa_node = xa_parent(xas->xa, xas->xa_node);
}
xas->xa_offset--;

because there's that conditional subtraction before the loop starts.
The xarray test-suite is pretty good if anyone thinks they have a clearer
way to write this loop ...



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-17 Thread Theodore Ts'o
On Fri, May 17, 2024 at 02:15:01PM -0700, Kees Cook wrote:
> On Thu, May 16, 2024 at 02:51:34PM -0600, Theodore Ts'o wrote:
> > On Thu, May 16, 2024 at 12:48:47PM -0700, Justin Stitt wrote:
> > > 
> > > It is incredibly important that the exact opposite approach is taken;
> > > we need to be annotating (or adding type qualifiers to) the _expected_
> > > overflow cases. The omniscience required to go and properly annotate
> > > all the spots that will cause problems would suggest that we should
> > > just fix the bug outright. If only it was that easy.
> > 
> > It certainly isn't easy, yes.  But the problem is when you dump a huge
> > amount of work and pain onto kernel developers, when they haven't
> > signed up for it, when they don't necessarily have the time to do all
> > of the work themselves, and when their corporate overlords won't given
> > them the headcount to handle unfunded mandates which folks who are
> > looking for a bright new wonderful future --- don't be surprised if
> > kernel developers push back hard.
> 
> I never claimed this to be some kind of "everyone has to stop and make
> these changes" event. I even talked about how it would be gradual and
> not break existing code (all "WARN and continue anyway"). This is what
> we've been doing with the array bounds work. Lots of people are helping
> with that, but a lot of those patches have been from Gustavo and me; we
> tried to keep the burden away from other developers as much as we could.

Kees, sorry, I wasn't intending to be criticism of your work; I know
you've been careful.  What I was reacting was Justin's comment that
it's important to annotate every single expected overflow case.  This
sounded like the very common attitude of "type II errors must be
avoided at all costs", even in that means a huge number of "type I
errors".  And false positive errors invariably end up requiring work
on the part of over-worked maintainers.

>From my perspective, it's OK if we don't find all possible security
bugs if we can keep the false negative rate down to near-zero.
Because if it's too high, I will just start ignoring all of the
"security reports", just out of self-defense.  This is now the case
with clusterfuzz reports example.  The attemp[t to shame/pressure me
with "fix this RIGHT NOW or we will make the report public in 30 days"
no longer works on me, because the vast majority of the reports
against e2fsprogs are bullshit.  And there are many syzkaller reports
which are also bullshit, and when they are reported against our data
center kernels, they get down-prioritized to P3 and ignored, because
they can't happen in real life.  But they still get reported to
upstream (me).

This is why I am worried when there are proposals to add new
sanitizers; even if they are used responsibly by you, I can easily see
them resulting in more clusterfuzz reports (for example) that I will
have to ignore as bullshit.  So please considerr this a plea to
**really** seriously reduce the false positive rate of sanitizers
whenever possible.

Cheers,

- Ted



Re: [PATCH v2] ntp: safeguard against time_constant overflow case

2024-05-17 Thread Thomas Gleixner
Justin!

On Fri, May 17 2024 at 00:47, Justin Stitt wrote:
>   if (txc->modes & ADJ_TIMECONST) {
> - time_constant = txc->constant;
> - if (!(time_status & STA_NANO))
> + if (!(time_status & STA_NANO) && time_constant < MAXTC)
>   time_constant += 4;
>   time_constant = min(time_constant, (long)MAXTC);
>   time_constant = max(time_constant, 0l);

Let me digest this.

The original code does:

time_constant = txc->constant;
if (!(time_status & STA_NANO))
time_constant += 4;
time_constant = min(time_constant, (long)MAXTC);
time_constant = max(time_constant, 0l);

Your change results in:

if (!(time_status & STA_NANO) && time_constant < MAXTC)
time_constant += 4;
time_constant = min(time_constant, (long)MAXTC);
time_constant = max(time_constant, 0l);

IOW, you lost the intent of the code to assign the user space supplied
value of txc->constant.

Aside of that you clearly failed to map the deep analysis I provided to
you vs. the time_maxerror issue to this one:

# git grep 'time_constant.*=' kernel/time/
ntp.c:66:static longtime_constant = 2;

  That's the static initializer

kernel/time/ntp.c:736:  time_constant = txc->constant;
kernel/time/ntp.c:738:  time_constant += 4;
kernel/time/ntp.c:739:  time_constant = min(time_constant, 
(long)MAXTC);
kernel/time/ntp.c:740:  time_constant = max(time_constant, 0l);

  That's the part of process_adjtimex_modes() you are trying to
  "fix". So it's exactly the same problem as with time_maxerror, no?

And therefore you provide a "safeguard" against overflow for the price of
making the syscall disfunctional. Seriously?

Did you even try to run something else than the bad case reproducer
against your fix?

No. You did not. Any of the related real use case tests would have
failed.

I told you yesterday:

   Tools are good to pin-point symptoms, but they are by definition
   patently bad in root cause analysis. Otherwise we could just let the
   tool write the "fix".

Such a tool would have at least produced a correct "fix" to cure the
symptom.

Thanks,

tglx



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-17 Thread Fangrui Song
On Thu, May 16, 2024 at 12:49 PM Justin Stitt  wrote:
>
> Hi,
>
> On Thu, May 16, 2024 at 7:09 AM Peter Zijlstra  wrote:
> >
> > On Thu, May 16, 2024 at 06:30:32AM -0700, Kees Cook wrote:
> > >
> > > I am a broken record. :) This is _not_ about undefined behavior.
> >
> > And yet you introduced CONFIG_UBSAN_SIGNED_WRAP... *UB*san, get it?
>
> We should think of UBSAN as an "Unexpected Behavior" Sanitizer. Clang
> has evolved to provide instrumentation for things that are not
> *technically* undefined behavior.
>
> Go to [1] and grep for some phrases like "not undefined behavior" and
> see that we have quite a few sanitizers that aren't *technically*
> handling undefined behavior.
>
> >
> > > This is about finding a way to make the intent of C authors
> > > unambiguous. That overflow wraps is well defined. It is not always
> > > _desired_. C has no way to distinguish between the two cases.
> >
> > The current semantics are (and have been for years, decades at this
> > point) that everything wraps nicely and code has been assuming this. You
> > cannot just change this.
>
> Why not :>)
>
> Lots and lots of exploits are caused by unintentional arithmetic overflow [2].
>
> >
> > So what you do is do a proper language extension and add a type
> > qualifier that makes overflows trap and annotate all them cases where
> > people do not expect overflows (so that we can put the
> > __builtin_*_overflow() things where the sun don't shine).
>
> It is incredibly important that the exact opposite approach is taken;
> we need to be annotating (or adding type qualifiers to) the _expected_
> overflow cases. The omniscience required to go and properly annotate
> all the spots that will cause problems would suggest that we should
> just fix the bug outright. If only it was that easy.
>
> I don't think we're capable of identifying every single problematic
> overflow/wraparound case in the kernel, this is pretty obvious
> considering we've had decades to do so. Instead, it seems much more
> feasible that we annotate (very, very minimally so as not to disrupt
> code readability and style) the spots where we _know_ overflow should
> happen.
>
> [1]: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#ubsan-checks
> [2]: https://cwe.mitre.org/data/definitions/190.html
>
> Thanks
> Justin

FWIW I have made -fsanitize=undefined -fwrapv not imply
-fsanitize=signed-integer-overflow in
https://github.com/llvm/llvm-project/pull/85501 .
The change is not included in the LLVM 18.1.* releases.
This might encourage projects using both -fwrapv and
-fsanitize=undefined to re-evaluate whether -fwrapv aligns with their
needs.

The description of #85501 contains more information about my
understanding of kernel security folks' mind:

> Linux kernel uses -fwrapv to change signed integer overflows from undefined 
> behaviors to defined behaviors. However, the security folks still want 
> -fsanitize=signed-integer-overflow diagnostics. Their intention can be 
> expressed with -fwrapv -fsanitize=signed-integer-overflow (#80089). This mode 
> by default reports recoverable errors while still making signed integer 
> overflows defined (most UBSan checks are recoverable by default: you get 
> errors in stderr, but the  program is not halted).


-- 
宋方睿



Re: [PATCH v3] fs: fix unintentional arithmetic wraparound in offset calculation

2024-05-17 Thread Kees Cook
On Fri, May 17, 2024 at 02:26:47AM +0100, Al Viro wrote:
> On Fri, May 17, 2024 at 02:13:22AM +0100, Matthew Wilcox wrote:
> > On Fri, May 17, 2024 at 12:29:06AM +, Justin Stitt wrote:
> > > When running syzkaller with the newly reintroduced signed integer
> > > overflow sanitizer we encounter this report:
> > 
> > why do you keep saying it's unintentional?  it's clearly intended.
> 
> Because they are short on actual bugs to be found by their tooling
> and attempt to inflate the sound/noise rate; therefore, every time

"short on bugs"? We're trying to drive it to zero. I would *love* to be
short on bugs. See my reply[1] to Ted.

> when overflow _IS_ handled correctly, it must have been an accident -
> we couldn't have possibly done the analysis correctly.  And if somebody
> insists that they _are_ capable of basic math, they must be dishonest.
> So... "unintentional" it's going to be.

As Justin said, this is a poor choice in wording. In other cases I've
tried to describe this as making changes so that intent is unambiguous
(to both a human and a compiler).

>  Math is hard, mmkay?  
> 
> Al, more than slightly annoyed by that aspect of the entire thing...

I'm sorry about that. None of this is a commentary on code correctness;
we're just trying to refactor things so that the compiler can help us
catch the _unintended_ overflows. This one is _intended_, so here we are
to find a palatable way to leave the behavior unchanged while gaining
compiler coverage.

-Kees

[1] https://lore.kernel.org/linux-hardening/202405171329.019F2F566C@keescook/

-- 
Kees Cook



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-17 Thread Kees Cook
On Thu, May 16, 2024 at 02:51:34PM -0600, Theodore Ts'o wrote:
> On Thu, May 16, 2024 at 12:48:47PM -0700, Justin Stitt wrote:
> > 
> > It is incredibly important that the exact opposite approach is taken;
> > we need to be annotating (or adding type qualifiers to) the _expected_
> > overflow cases. The omniscience required to go and properly annotate
> > all the spots that will cause problems would suggest that we should
> > just fix the bug outright. If only it was that easy.
> 
> It certainly isn't easy, yes.  But the problem is when you dump a huge
> amount of work and pain onto kernel developers, when they haven't
> signed up for it, when they don't necessarily have the time to do all
> of the work themselves, and when their corporate overlords won't given
> them the headcount to handle unfunded mandates which folks who are
> looking for a bright new wonderful future --- don't be surprised if
> kernel developers push back hard.

I never claimed this to be some kind of "everyone has to stop and make
these changes" event. I even talked about how it would be gradual and
not break existing code (all "WARN and continue anyway"). This is what
we've been doing with the array bounds work. Lots of people are helping
with that, but a lot of those patches have been from Gustavo and me; we
tried to keep the burden away from other developers as much as we could.

> One of the big problems that we've seen with many of these security
> initiatives is that the teams create these unfunded mandates get their
> performance reviews based on how many "bug reports" that they file,
> regardless of whether they are real problems or not.  This has been a
> problem with syzkaller, and with clusterfuzz.  Let's not make this
> problem worse with new and fancy sanitizers, please.

Are you talking about *my* security initiatives? I've been doing this
kind work in the kernel for 10 years, and at no time has "bug report
count" been a metric. In fact, the whole goal is making it _impossible_
to have a bug. (e.g. VLA removal, switch fallthrough, etc). My drive has
been to kill entire classes of bugs.

The use of sanitizers isn't to just bolster fuzzers (though they're
helpful for finding false positives). It's to use the sanitizers _in
production_, to stop flaws from being exploitable. Android has enabled
the array bounds sanitizer in trap mode for at least 2 years now. We
want the kernel to be self-protective; pro-actively catching flaws.

> Unfortunately, I don't get funding from my employer to clear these
> kinds of reports, so when I do the work, it happens on the weekends or
> late at night, on my own time, which is probably why I am so grumpy

As for the work itself, like I mentioned before, most of these get fixed
my Gustavo, me, and now Justin too. And many other folks step up to help
out, which is great. Some get complex and other maintainers get involved
too, but it's slow and steady. We're trying to reduce the frequency of
the "fires" people have to scramble to deal with.

The "not getting paid by Google to [fix syzkaller bugs]" part, I'm
surprised by. A big part of my Google job role is the upstream work I do
not only on security hardening but also on seccomp, pstore, execve/binfmt,
strings, etc. I'll reach out offline to find out more details.

> about this.  Whether you call this "sharpening our focus", or "year of
> efficiency", or pick your corporate buzzwords, it really doesn't
> matter.  The important thing is that the figure of merit must NOT be
> "how many security bugs that are found", but how much bullsh*t noise
> do these security features create, and how do you decrease overhead by
> upstream developers to deal with the fuzzing/ubsan/security tools
> find.

I guess I *do* worry about bug counts, but only in that I want them to
be _zero_. I know other folks aren't as adamant about eliminating bug
classes, but it's really not hyperbole that bugs in Linux kill people.
If you think I'm engaging in corporate buzzword shenanigans, then I have
a lot more work to do on explaining the rationale behind the security
hardening efforts.

All this said, yes, I hear what you (and Linus and others) have been
saying about minimizing the burden on other developers. I have tried my
best to keep it limited, but some things are more front-and-center (like
unexpected integer overflows), so that's why I wanted to get feedback on
how to roll it out -- I didn't see a way to make these changes in a way
that would be as unintrusive(?) as our prior efforts.

It has felt like the biggest pain point has been helping developers
shift their perspective about C: it has been more than a fancy assembler
for several decades, and we can lean on those features (and create new
ones) that shift the burden of correctness to the compiler from the
human. This does mean we need to change some styles to be more
"observable" (and unambiguous) for the compiler, though.

I think a great example recently was Peter's work creating "cleanup.h".
This feature 

Re: [PATCH v2] wifi: mac80211: Avoid address calculations via out of bounds array indexing

2024-05-17 Thread Simon Horman
On Fri, May 17, 2024 at 10:54:20AM -0400, Kenton Groombridge wrote:
> req->n_channels must be set before req->channels[] can be used.
> 
> This patch fixes one of the issues encountered in [1].
> 
> [   83.964252] [ cut here ]
> [   83.964255] UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:364:4
> [   83.964258] index 0 is out of range for type 'struct ieee80211_channel *[]'
> [   83.964260] CPU: 0 PID: 1695 Comm: iwd Tainted: G   OT 
> 6.8.9-gentoo-hardened1 #1
> [   83.964262] Hardware name: System76 Pangolin/Pangolin, BIOS 
> ARB928_V00.01_T0025ASY1_ms 04/20/2023
> [   83.964264] Call Trace:
> [   83.964267]  
> [   83.964269]  dump_stack_lvl+0x3f/0xc0
> [   83.964274]  __ubsan_handle_out_of_bounds+0xec/0x110
> [   83.964278]  ieee80211_prep_hw_scan+0x2db/0x4b0
> [   83.964281]  __ieee80211_start_scan+0x601/0x990
> [   83.964284]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964287]  ? cfg80211_scan+0x149/0x250
> [   83.964291]  nl80211_trigger_scan+0x874/0x980
> [   83.964295]  genl_family_rcv_msg_doit+0xe8/0x160
> [   83.964298]  genl_rcv_msg+0x240/0x270
> [   83.964301]  ? __cfi_nl80211_trigger_scan+0x10/0x10
> [   83.964302]  ? __cfi_nl80211_post_doit+0x10/0x10
> [   83.964304]  ? __cfi_nl80211_pre_doit+0x10/0x10
> [   83.964307]  ? __cfi_genl_rcv_msg+0x10/0x10
> [   83.964309]  netlink_rcv_skb+0x102/0x130
> [   83.964312]  genl_rcv+0x23/0x40
> [   83.964314]  netlink_unicast+0x23b/0x340
> [   83.964316]  netlink_sendmsg+0x3a9/0x450
> [   83.964319]  __sys_sendto+0x3ae/0x3c0
> [   83.964324]  __x64_sys_sendto+0x21/0x40
> [   83.964326]  do_syscall_64+0x90/0x150
> [   83.964329]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964331]  ? syscall_exit_work+0xc2/0xf0
> [   83.964333]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964335]  ? syscall_exit_to_user_mode+0x74/0xa0
> [   83.964337]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964339]  ? do_syscall_64+0x9c/0x150
> [   83.964340]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964342]  ? syscall_exit_to_user_mode+0x74/0xa0
> [   83.964344]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964346]  ? do_syscall_64+0x9c/0x150
> [   83.964347]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964349]  ? do_syscall_64+0x9c/0x150
> [   83.964351]  ? srso_alias_return_thunk+0x5/0xfbef5
u> [   83.964353]  ? syscall_exit_work+0xc2/0xf0
> [   83.964354]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964356]  ? syscall_exit_to_user_mode+0x74/0xa0
> [   83.964358]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964359]  ? do_syscall_64+0x9c/0x150
> [   83.964361]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964362]  ? do_user_addr_fault+0x488/0x620
> [   83.964366]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964367]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   83.964369]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
> [   83.964372] RIP: 0033:0x6200808578d7
> [   83.964374] Code: 00 00 90 f3 0f 1e fa 41 56 55 41 89 ce 48 83 ec 28 80 3d 
> 7b f7 0d 00 00 74 29 45 31 c9 45 31 c0 41 89 ca b8 2c 00 00 00 0f 05 <48> 3d 
> 00 f0 ff ff 77 71 48 83 c4 28 5d 41 5e c3 66 0f 1f 84 00 00
> [   83.964375] RSP: 002b:730c4e821530 EFLAGS: 0246 ORIG_RAX: 
> 002c
> [   83.964378] RAX: ffda RBX: 06dbc456c570 RCX: 
> 6200808578d7
> [   83.964379] RDX: 005c RSI: 06dbc45884f0 RDI: 
> 0004
> [   83.964381] RBP: 0004 R08:  R09: 
> 
> [   83.964382] R10:  R11: 0246 R12: 
> 06dbc456c480
> [   83.964383] R13: 06dbc456c450 R14:  R15: 
> 06dbc456c610
> [   83.964386]  
> [   83.964386] ---[ end trace ]---
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=218810
> 
> v1->v2:
> - Drop changes in cfg80211 as requested by Johannes
> 
> Co-authored-by: Kees Cook 
> Signed-off-by: Kenton Groombridge 

Thanks Kenton,

FWWIW, it seems unfortunate to me that the __counted_by field (n_channels)
is set some distance away from the allocation of the flex-array (channels)
whose bounds it checks. It seems it would be pretty easy for a bug in the
code being updated here to result in an overrun.

But in any case, I think this is an improvement and seems correct to me.

Reviewed-by: Simon Horman 



Re: [PATCH v3] fs: fix unintentional arithmetic wraparound in offset calculation

2024-05-17 Thread Justin Stitt
Hi,

On Thu, May 16, 2024 at 6:13 PM Matthew Wilcox  wrote:
>
> On Fri, May 17, 2024 at 12:29:06AM +, Justin Stitt wrote:
> > When running syzkaller with the newly reintroduced signed integer
> > overflow sanitizer we encounter this report:
>
> why do you keep saying it's unintentional?  it's clearly intended.

Right, "unintentional" is a poor choice of phrasing. I actually mean:
"overflow-checking arithmetic was done in a way that intrinsically
causes an overflow (wraparound)".

I can clearly see the intent of the code; there's even comments saying
exactly what it does: "/* Ensure offsets don't wrap. */"... So the
thinking is: let's use the overflow-checking helpers so we can get a
good signal through the sanitizers on _real_ bugs, especially in spots
with no bounds handling.


Thanks
Justin



[PATCH v2] ntp: remove accidental integer wrap-around

2024-05-17 Thread Justin Stitt
Using syzkaller alongside the newly reintroduced signed integer overflow
sanitizer spits out this report:

UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16
9223372036854775807 + 500 cannot be represented in type 'long'
Call Trace:
 
 dump_stack_lvl+0x93/0xd0
 handle_overflow+0x171/0x1b0
 second_overflow+0x2d6/0x500
 accumulate_nsecs_to_secs+0x60/0x160
 timekeeping_advance+0x1fe/0x890
 update_wall_time+0x10/0x30
...

time_maxerror is unconditionally incremented and the result is checked
against NTP_PHASE_LIMIT, but the increment itself can overflow,
resulting in wrap-around to negative space.

The user can supply some crazy values which is causing the overflow. Add
an extra validation step checking that maxerror is reasonable.

Link: https://github.com/llvm/llvm-project/pull/82432 [1]
Closes: https://github.com/KSPP/linux/issues/354
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt 
---
Changes in v2:
- update commit log (thanks Thomas)
- check for sane user input during validation (thanks Thomas)
- Link to v1: 
https://lore.kernel.org/r/20240507-b4-sio-ntp-usec-v1-1-15003fc9c...@google.com
---

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang. It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").

Here's the syzkaller reproducer:
| #{Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox:
| #SandboxArg:0 Leak:false NetInjection:false NetDevices:false
| #NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false
| #DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false
| #IEEE802154:false Sysctl:false Swap:false UseTmpDir:false
| #HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false
| #Fault:false FaultCall:0 FaultNth:0}}
| clock_adjtime(0x0, &(0x7f00)={0x5, 0x1, 0x40,
| 0x7fff, 0x8, 0xb2, 0x256, 0x6, 0x5, 0x8001, 0x9, 0x3f, 0x0,
| 0x8000, 0x800, 0x64d, 0x5, 0x7ff, 0x8001, 0x1f, 0x3,
| 0xfff, 0x7fff, 0x5, 0x100, 0x4})

... which was used against Kees' tree here (v6.8rc2):
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer

... with this config:
https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4
---
 kernel/time/timekeeping.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index b58dffc58a8f..321f251c02aa 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2388,6 +2388,11 @@ static int timekeeping_validate_timex(const struct 
__kernel_timex *txc)
}
}
 
+   if (txc->modes & ADJ_MAXERROR) {
+   if (txc->maxerror < 0 || txc->maxerror > NTP_PHASE_LIMIT)
+   return -EINVAL;
+   }
+
/*
 * Check for potential multiplication overflows that can
 * only happen on 64-bit systems:

---
base-commit: 0106679839f7c69632b3b9833c3268c316c0a9fc
change-id: 20240507-b4-sio-ntp-usec-1a3ab67bdce1

Best regards,
--
Justin Stitt 




Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first

2024-05-17 Thread Doug Anderson
Hi,

On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson  wrote:
>
> When testing hard lockup handling on my sc7180-trogdor-lazor device
> with pseudo-NMI enabled, with serial console enabled and with kgdb
> disabled, I found that the stack crawls printed to the serial console
> ended up as a jumbled mess. After rebooting, the pstore-based console
> looked fine though. Also, enabling kgdb to trap the panic made the
> console look fine and avoided the mess.
>
> After a bit of tracking down, I came to the conclusion that this was
> what was happening:
> 1. The panic path was stopping all other CPUs with
>panic_other_cpus_shutdown().
> 2. At least one of those other CPUs was in the middle of printing to
>the serial console and holding the console port's lock, which is
>grabbed with "irqsave". ...but since we were stopping with an NMI
>we didn't care about the "irqsave" and interrupted anyway.
> 3. Since we stopped the CPU while it was holding the lock it would
>never release it.
> 4. All future calls to output to the console would end up failing to
>get the lock in qcom_geni_serial_console_write(). This isn't
>_totally_ unexpected at panic time but it's a code path that's not
>well tested, hard to get right, and apparently doesn't work
>terribly well on the Qualcomm geni serial driver.
>
> It would probably be a reasonable idea to try to make the Qualcomm
> geni serial driver work better, but also it's nice not to get into
> this situation in the first place.
>
> Taking a page from what x86 appears to do in native_stop_other_cpus(),
> let's do this:
> 1. First, we'll try to stop other CPUs with a normal IPI and wait a
>second. This gives them a chance to leave critical sections.
> 2. If CPUs fail to stop then we'll retry with an NMI, but give a much
>lower timeout since there's no good reason for a CPU not to react
>quickly to a NMI.
>
> This works well and avoids the corrupted console and (presumably)
> could help avoid other similar issues.
>
> In order to do this, we need to do a little re-organization of our
> IPIs since we don't have any more free IDs. We'll do what was
> suggested in previous conversations and combine "stop" and "crash
> stop". That frees up an IPI so now we can have a "stop" and "stop
> NMI".
>
> In order to do this we also need a slight change in the way we keep
> track of which CPUs still need to be stopped. We need to know
> specifically which CPUs haven't stopped yet when we fall back to NMI
> but in the "crash stop" case the "cpu_online_mask" isn't updated as
> CPUs go down. This is why that code path had an atomic of the number
> of CPUs left. We'll solve this by making the cpumask into a
> global. This has a potential memory implication--with NR_CPUs = 4096
> this is 4096/8 = 512 bytes of globals. On the upside in that same case
> we take 512 bytes off the stack which could potentially have made the
> stop code less reliable. It can be noted that the NMI backtrace code
> (lib/nmi_backtrace.c) uses the same approach and that use also
> confirms that updating the mask is safe from NMI.
>
> All of the above lets us combine the logic for "stop" and "crash stop"
> code, which appeared to have a bunch of arbitrary implementation
> differences. Possibly this could make up for some of the 512 wasted
> bytes. ;-)
>
> Aside from the above change where we try a normal IPI and then an NMI,
> the combined function has a few subtle differences:
> * In the normal smp_send_stop(), if we fail to stop one or more CPUs
>   then we won't include the current CPU (the one running
>   smp_send_stop()) in the error message.
> * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print
>   the CPUs that we failed to stop instead of printing all _but_ the
>   current running CPU.
> * In crash_smp_send_stop(), we will now only print "SMP: stopping
>   secondary CPUs" if (system_state <= SYSTEM_RUNNING).
>
> Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should 
> try for NMI")
> Signed-off-by: Douglas Anderson 
> ---
> I'm not setup to test the crash_smp_send_stop(). I made sure it
> compiled and hacked the panic() method to call it, but I haven't
> actually run kexec. Hopefully others can confirm that it's working for
> them.
>
>  arch/arm64/kernel/smp.c | 115 +++-
>  1 file changed, 54 insertions(+), 61 deletions(-)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index defbab84e9e5..9fe9d4342517 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -71,7 +71,7 @@ enum ipi_msg_type {
> IPI_RESCHEDULE,
> IPI_CALL_FUNC,
> IPI_CPU_STOP,
> -   IPI_CPU_CRASH_STOP,
> +   IPI_CPU_STOP_NMI,
> IPI_TIMER,
> IPI_IRQ_WORK,
> NR_IPI,
> @@ -88,6 +88,9 @@ static int ipi_irq_base __ro_after_init;
>  static int nr_ipi __ro_after_init = NR_IPI;
>  static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init;
>
> 

Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first

2024-05-17 Thread Doug Anderson
Hi,

On Fri, Apr 12, 2024 at 6:55 AM Will Deacon  wrote:
>
> Hi Doug,
>
> I'm doing some inbox Spring cleaning!

No worries. I got your reply while I was on a bunch of business travel
and finally cleared stuff out enough to take a look again. ;-)


> On Thu, Dec 07, 2023 at 05:02:56PM -0800, Douglas Anderson wrote:
> > When testing hard lockup handling on my sc7180-trogdor-lazor device
> > with pseudo-NMI enabled, with serial console enabled and with kgdb
> > disabled, I found that the stack crawls printed to the serial console
> > ended up as a jumbled mess. After rebooting, the pstore-based console
> > looked fine though. Also, enabling kgdb to trap the panic made the
> > console look fine and avoided the mess.
> >
> > After a bit of tracking down, I came to the conclusion that this was
> > what was happening:
> > 1. The panic path was stopping all other CPUs with
> >panic_other_cpus_shutdown().
> > 2. At least one of those other CPUs was in the middle of printing to
> >the serial console and holding the console port's lock, which is
> >grabbed with "irqsave". ...but since we were stopping with an NMI
> >we didn't care about the "irqsave" and interrupted anyway.
> > 3. Since we stopped the CPU while it was holding the lock it would
> >never release it.
> > 4. All future calls to output to the console would end up failing to
> >get the lock in qcom_geni_serial_console_write(). This isn't
> >_totally_ unexpected at panic time but it's a code path that's not
> >well tested, hard to get right, and apparently doesn't work
> >terribly well on the Qualcomm geni serial driver.
> >
> > It would probably be a reasonable idea to try to make the Qualcomm
> > geni serial driver work better, but also it's nice not to get into
> > this situation in the first place.
> >
> > Taking a page from what x86 appears to do in native_stop_other_cpus(),
> > let's do this:
> > 1. First, we'll try to stop other CPUs with a normal IPI and wait a
> >second. This gives them a chance to leave critical sections.
> > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much
> >lower timeout since there's no good reason for a CPU not to react
> >quickly to a NMI.
> >
> > This works well and avoids the corrupted console and (presumably)
> > could help avoid other similar issues.
> >
> > In order to do this, we need to do a little re-organization of our
> > IPIs since we don't have any more free IDs. We'll do what was
> > suggested in previous conversations and combine "stop" and "crash
> > stop". That frees up an IPI so now we can have a "stop" and "stop
> > NMI".
> >
> > In order to do this we also need a slight change in the way we keep
> > track of which CPUs still need to be stopped. We need to know
> > specifically which CPUs haven't stopped yet when we fall back to NMI
> > but in the "crash stop" case the "cpu_online_mask" isn't updated as
> > CPUs go down. This is why that code path had an atomic of the number
> > of CPUs left. We'll solve this by making the cpumask into a
> > global. This has a potential memory implication--with NR_CPUs = 4096
> > this is 4096/8 = 512 bytes of globals. On the upside in that same case
> > we take 512 bytes off the stack which could potentially have made the
> > stop code less reliable. It can be noted that the NMI backtrace code
> > (lib/nmi_backtrace.c) uses the same approach and that use also
> > confirms that updating the mask is safe from NMI.
>
> Updating the global masks without any synchronisation feels broken though:
>
> > @@ -1085,77 +1080,75 @@ void smp_send_stop(void)
> >  {
> >   unsigned long timeout;
> >
> > - if (num_other_online_cpus()) {
> > - cpumask_t mask;
> > + /*
> > +  * If this cpu is the only one alive at this point in time, online or
> > +  * not, there are no stop messages to be sent around, so just back 
> > out.
> > +  */
> > + if (num_other_online_cpus() == 0)
> > + goto skip_ipi;
> >
> > - cpumask_copy(, cpu_online_mask);
> > - cpumask_clear_cpu(smp_processor_id(), );
> > + cpumask_copy(to_cpumask(stop_mask), cpu_online_mask);
> > + cpumask_clear_cpu(smp_processor_id(), to_cpumask(stop_mask));
>
> I don't see what prevents multiple CPUs getting in here concurrently and
> tripping over the masks. x86 seems to avoid that with an atomic
> 'stopping_cpu' variable in native_stop_other_cpus(). Do we need something
> similar?

Good point. nmi_trigger_cpumask_backtrace(), which my code was based
on, has a test_and_set() for this and that seems simpler than the
atomic_try_cmpxchg() from the x86 code.

If we run into that case, what do you think we should do? I guess x86
just does a "return", though it feels like at least a warning should
be printed since we're not doing what the function asked us to do.
When we return there will be other CPUs running.

In theory, we could try to help the other processor along? I don't
know how much 

[PATCH v3 2/2] tty: rfcomm: prefer array indexing over pointer arithmetic

2024-05-17 Thread Erick Archer
Refactor the list_for_each_entry() loop of rfcomm_get_dev_list()
function to use array indexing instead of pointer arithmetic.

This way, the code is more readable and idiomatic.

Reviewed-by: Kees Cook 
Signed-off-by: Erick Archer 
---
 net/bluetooth/rfcomm/tty.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c
index 44b781e7569e..af80d599c337 100644
--- a/net/bluetooth/rfcomm/tty.c
+++ b/net/bluetooth/rfcomm/tty.c
@@ -527,12 +527,12 @@ static int rfcomm_get_dev_list(void __user *arg)
list_for_each_entry(dev, _dev_list, list) {
if (!tty_port_get(>port))
continue;
-   (di + n)->id  = dev->id;
-   (di + n)->flags   = dev->flags;
-   (di + n)->state   = dev->dlc->state;
-   (di + n)->channel = dev->channel;
-   bacpy(&(di + n)->src, >src);
-   bacpy(&(di + n)->dst, >dst);
+   di[n].id  = dev->id;
+   di[n].flags   = dev->flags;
+   di[n].state   = dev->dlc->state;
+   di[n].channel = dev->channel;
+   bacpy([n].src, >src);
+   bacpy([n].dst, >dst);
tty_port_put(>port);
if (++n >= dev_num)
break;
-- 
2.25.1




[PATCH v3 1/2] tty: rfcomm: prefer struct_size over open coded arithmetic

2024-05-17 Thread Erick Archer
This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].

As the "dl" variable is a pointer to "struct rfcomm_dev_list_req" and
this structure ends in a flexible array:

struct rfcomm_dev_list_req {
[...]
struct   rfcomm_dev_info dev_info[];
};

the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the calculation "size + count * size" in
the kzalloc() and copy_to_user() functions.

At the same time, prepare for the coming implementation by GCC and Clang
of the __counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time via
CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
strcpy/memcpy-family functions).

In this case, it is important to note that the logic needs a little
refactoring to ensure that the "dev_num" member is initialized before
the first access to the flex array. Specifically, add the assignment
before the list_for_each_entry() loop.

Also remove the "size" variable as it is no longer needed.

This way, the code is more readable and safer.

This code was detected with the help of Coccinelle, and audited and
modified manually.

Link: 
https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
 [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
Reviewed-by: Kees Cook 
Signed-off-by: Erick Archer 
---
 include/net/bluetooth/rfcomm.h |  2 +-
 net/bluetooth/rfcomm/tty.c | 11 ---
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/include/net/bluetooth/rfcomm.h b/include/net/bluetooth/rfcomm.h
index 99d26879b02a..c05882476900 100644
--- a/include/net/bluetooth/rfcomm.h
+++ b/include/net/bluetooth/rfcomm.h
@@ -355,7 +355,7 @@ struct rfcomm_dev_info {
 
 struct rfcomm_dev_list_req {
u16  dev_num;
-   struct   rfcomm_dev_info dev_info[];
+   struct   rfcomm_dev_info dev_info[] __counted_by(dev_num);
 };
 
 int  rfcomm_dev_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c
index 69c75c041fe1..44b781e7569e 100644
--- a/net/bluetooth/rfcomm/tty.c
+++ b/net/bluetooth/rfcomm/tty.c
@@ -504,7 +504,7 @@ static int rfcomm_get_dev_list(void __user *arg)
struct rfcomm_dev *dev;
struct rfcomm_dev_list_req *dl;
struct rfcomm_dev_info *di;
-   int n = 0, size, err;
+   int n = 0, err;
u16 dev_num;
 
BT_DBG("");
@@ -515,12 +515,11 @@ static int rfcomm_get_dev_list(void __user *arg)
if (!dev_num || dev_num > (PAGE_SIZE * 4) / sizeof(*di))
return -EINVAL;
 
-   size = sizeof(*dl) + dev_num * sizeof(*di);
-
-   dl = kzalloc(size, GFP_KERNEL);
+   dl = kzalloc(struct_size(dl, dev_info, dev_num), GFP_KERNEL);
if (!dl)
return -ENOMEM;
 
+   dl->dev_num = dev_num;
di = dl->dev_info;
 
mutex_lock(_dev_lock);
@@ -542,9 +541,7 @@ static int rfcomm_get_dev_list(void __user *arg)
mutex_unlock(_dev_lock);
 
dl->dev_num = n;
-   size = sizeof(*dl) + n * sizeof(*di);
-
-   err = copy_to_user(arg, dl, size);
+   err = copy_to_user(arg, dl, struct_size(dl, dev_info, n));
kfree(dl);
 
return err ? -EFAULT : 0;
-- 
2.25.1




[PATCH v3 0/2] tty: rfcomm: refactor rfcomm_get_dev_list() function

2024-05-17 Thread Erick Archer
This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].

As the "dl" variable is a pointer to "struct rfcomm_dev_list_req" and
this structure ends in a flexible array:

struct rfcomm_dev_list_req {
[...]
struct   rfcomm_dev_info dev_info[];
};

the preferred way in the kernel is to use the struct_size() helper to
do the arithmetic instead of the calculation "size + count * size" in
the kzalloc() and copy_to_user() functions.

At the same time, prepare for the coming implementation by GCC and Clang
of the __counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time via
CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
strcpy/memcpy-family functions).

In this case, it is important to note that the logic needs a little
refactoring to ensure that the "dev_num" member is initialized before
the first access to the flex array. Specifically, add the assignment
before the list_for_each_entry() loop.

Also remove the "size" variable as it is no longer needed and refactor
the list_for_each_entry() loop to use di[n] instead of (di + n).

This way, the code is more readable, idiomatic and safer.

This code was detected with the help of Coccinelle, and audited and
modified manually.

Specifically, the first patch is related to the struct_size() helper
and the second patch refactors the list_for_each_entry() loop to use
array indexing instead of pointer arithmetic.

Regards,
Erick

Link: 
https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
 [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
---
Changes in v3:
- Add the "Reviewed-by:" tags.
- Split the changes in two commits (Jiri Slaby, Luiz Augusto von Dentz).

Changes in v2:
- Add the __counted_by() attribute (Kees Cook).
- Refactor the list_for_each_entry() loop to use di[n] instead of
  (di + n) (Kees Cook).

Previous versions:
v2 -> 
https://lore.kernel.org/linux-hardening/as8pr02mb7237262c62b054fabd7229168b...@as8pr02mb7237.eurprd02.prod.outlook.com/
v1 -> 
https://lore.kernel.org/linux-hardening/as8pr02mb723725e0069f7ae8f64e876e8b...@as8pr02mb7237.eurprd02.prod.outlook.com/
---
Erick Archer (2):
  tty: rfcomm: prefer struct_size over open coded arithmetic
  tty: rfcomm: prefer array indexing over pointer arithmetic

 include/net/bluetooth/rfcomm.h |  2 +-
 net/bluetooth/rfcomm/tty.c | 23 ++-
 2 files changed, 11 insertions(+), 14 deletions(-)

-- 
2.25.1




[PATCH 1/2] wifi: cfg80211: use __counted_by where appropriate

2024-05-17 Thread Dmitry Antipov
Annotate 'sub_specs' of 'struct cfg80211_sar_specs', 'channels'
of 'struct cfg80211_sched_scan_request', 'channels' of 'struct
cfg80211_wowlan_nd_match', and 'matches' of 'struct
cfg80211_wowlan_nd_info' with '__counted_by' attribute. Briefly
tested with clang 18.1.1 and CONFIG_UBSAN_BOUNDS running iwlwifi.

Signed-off-by: Dmitry Antipov 
---
 include/net/cfg80211.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index cbf1664dc569..d79180bec7a1 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2200,7 +2200,7 @@ struct cfg80211_sar_sub_specs {
 struct cfg80211_sar_specs {
enum nl80211_sar_type type;
u32 num_sub_specs;
-   struct cfg80211_sar_sub_specs sub_specs[];
+   struct cfg80211_sar_sub_specs sub_specs[] __counted_by(num_sub_specs);
 };
 
 
@@ -2838,7 +2838,7 @@ struct cfg80211_sched_scan_request {
struct list_head list;
 
/* keep last */
-   struct ieee80211_channel *channels[];
+   struct ieee80211_channel *channels[] __counted_by(n_channels);
 };
 
 /**
@@ -3582,7 +3582,7 @@ struct cfg80211_coalesce {
 struct cfg80211_wowlan_nd_match {
struct cfg80211_ssid ssid;
int n_channels;
-   u32 channels[];
+   u32 channels[] __counted_by(n_channels);
 };
 
 /**
@@ -3596,7 +3596,7 @@ struct cfg80211_wowlan_nd_match {
  */
 struct cfg80211_wowlan_nd_info {
int n_matches;
-   struct cfg80211_wowlan_nd_match *matches[];
+   struct cfg80211_wowlan_nd_match *matches[] __counted_by(n_matches);
 };
 
 /**
-- 
2.45.1




[PATCH 2/2] wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan()

2024-05-17 Thread Dmitry Antipov
When testing the previous patch with CONFIG_UBSAN_BOUNDS, I've
noticed the following:

UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:372:4
index 0 is out of range for type 'struct ieee80211_channel *[]'
CPU: 0 PID: 1435 Comm: wpa_supplicant Not tainted 6.9.0+ #1
Hardware name: LENOVO 20UN005QRT/20UN005QRT <...BIOS details...>
Call Trace:
 
 dump_stack_lvl+0x2d/0x90
 __ubsan_handle_out_of_bounds+0xe7/0x140
 ? timerqueue_add+0x98/0xb0
 ieee80211_prep_hw_scan+0x2db/0x480 [mac80211]
 ? __kmalloc+0xe1/0x470
 __ieee80211_start_scan+0x541/0x760 [mac80211]
 rdev_scan+0x1f/0xe0 [cfg80211]
 nl80211_trigger_scan+0x9b6/0xae0 [cfg80211]
 ...

Since '__ieee80211_start_scan()' leaves 'hw_scan_req->req.n_channels'
uninitialized, actual boundaries of 'hw_scan_req->req.channels' can't
be checked in 'ieee80211_prep_hw_scan()'. Although an initialization
of 'hw_scan_req->req.n_channels' introduces some confusion around
allocated vs. used VLA members, this shouldn't be a problem since
everything is correctly adjusted soon in 'ieee80211_prep_hw_scan()'.

Cleanup 'kmalloc()' math in '__ieee80211_start_scan()' by using the
convenient 'struct_size()' as well.

Signed-off-by: Dmitry Antipov 
---
 net/mac80211/scan.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index 3da1c5c45035..ab2a11a02673 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -745,14 +745,19 @@ static int __ieee80211_start_scan(struct 
ieee80211_sub_if_data *sdata,
}
 
local->hw_scan_req = kmalloc(
-   sizeof(*local->hw_scan_req) +
-   req->n_channels * sizeof(req->channels[0]) +
-   local->hw_scan_ies_bufsize, GFP_KERNEL);
+   (struct_size(local->hw_scan_req,
+req.channels, req->n_channels)
++ local->hw_scan_ies_bufsize), GFP_KERNEL);
if (!local->hw_scan_req)
return -ENOMEM;
 
local->hw_scan_req->req.ssids = req->ssids;
local->hw_scan_req->req.n_ssids = req->n_ssids;
+   /* None of the channels are actually set
+* up but let UBSAN know the boundaries.
+*/
+   local->hw_scan_req->req.n_channels = req->n_channels;
+
ies = (u8 *)local->hw_scan_req +
sizeof(*local->hw_scan_req) +
req->n_channels * sizeof(req->channels[0]);
-- 
2.45.1




Re: [PATCH 05/12] dmaengine: Add STM32 DMA3 support

2024-05-17 Thread Frank Li
On Fri, May 17, 2024 at 11:42:17AM +0200, Amelie Delaunay wrote:
> On 5/16/24 19:09, Frank Li wrote:
> > On Thu, May 16, 2024 at 05:25:58PM +0200, Amelie Delaunay wrote:
> > > On 5/15/24 20:56, Frank Li wrote:
> > > > On Tue, Apr 23, 2024 at 02:32:55PM +0200, Amelie Delaunay wrote:
> > > > > STM32 DMA3 driver supports the 3 hardware configurations of the STM32 
> > > > > DMA3
> > > > > controller:
> > ...
> > > > > + writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id));
> > > > > + writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id));
> > > > > +
> > > > > + /* Clear any pending interrupts */
> > > > > + csr = readl_relaxed(ddata->base + STM32_DMA3_CSR(id));
> > > > > + if (csr & CSR_ALL_F)
> > > > > + writel_relaxed(csr, ddata->base + STM32_DMA3_CFCR(id));
> > > > > +
> > > > > + stm32_dma3_chan_dump_reg(chan);
> > > > > +
> > > > > + ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(id));
> > > > > + writel_relaxed(ccr | CCR_EN, ddata->base + STM32_DMA3_CCR(id));
> > > > 
> > > > This one should use writel instead of writel_relaxed because it need
> > > > dma_wmb() as barrier for preious write complete.
> > > > 
> > > > Frank
> > > > 
> > > 
> > > ddata->base is Device memory type thanks to ioremap() use, so it is 
> > > strongly
> > > ordered and non-cacheable.
> > > DMA3 is outside CPU cluster, its registers are accessible through AHB bus.
> > > dma_wmb() (in case of writel instead of writel_relaxed) is useless in that
> > > case: it won't ensure the propagation on the bus is complete, and it will
> > > have impacts on the system.
> > > That's why CCR register is written once,  then it is read before CCR_EN is
> > > set and being written again, with _relaxed(), because registers are 
> > > behind a
> > > bus, and ioremapped with Device memory type which ensures it is strongly
> > > ordered and non-cacheable.
> > 
> > regardless memory map, writel_relaxed() just make sure io write and read is
> > orderred, not necessary order with other memory access. only readl and
> > writel make sure order with other memory read/write.
> > 
> > 1. Write src_addr to descriptor
> > 2. dma_wmb()
> > 3. Write "ready" to descriptor
> > 4. enable channel or doorbell by write a register.
> > 
> > if 4 use writel_relaxe(). because 3 write to DDR, which difference place of
> > mmio, 4 may happen before 3.  Your can refer axi order model.
> > 
> > 4 have to use ONE writel(), to make sure 3 already write to DDR.
> > 
> > You need use at least one writel() to make sure all nornmal memory finish.
> > 
> 
> +writel_relaxed(chan->swdesc->ccr, ddata->base + STM32_DMA3_CCR(id));
> +writel_relaxed(hwdesc->ctr1, ddata->base + STM32_DMA3_CTR1(id));
> +writel_relaxed(hwdesc->ctr2, ddata->base + STM32_DMA3_CTR2(id));
> +writel_relaxed(hwdesc->cbr1, ddata->base + STM32_DMA3_CBR1(id));
> +writel_relaxed(hwdesc->csar, ddata->base + STM32_DMA3_CSAR(id));
> +writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id));
> +writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id));
> 
> These writel_relaxed() are from descriptors to DMA3 registers (descriptors
> being prepared "a long time ago" during _prep_).

You can't depend on "a long time ago" during _prep_. If later your driver
run at fast CPU. The execute time will be short.

All dma_map_sg and dma_alloc_coherence ... need at least one writel() to
make sure previous write actually reach to DDR. 

Some data may not really reach DDR, when DMA already start transfer

Please ref linux kernel document: 
Documentation/memory-barriers.txt, line 1948.

In your issue_pending(), call this function to enable channel. So need
at least one writel().

> As I said previously, DMA3 registers are outside CPU cluster, accessible
> through AHB bus, and ddata->base to address registers is ioremapped as
> Device memory type, non-cacheable and strongly ordered.
> 
> arch/arm/include/asm/io.h:
> /*
> * ioremap() and friends.
> *
> * ioremap() takes a resource address, and size.  Due to the ARM memory
> * types, it is important to use the correct ioremap() function as each
> * mapping has specific properties.
> *
> * FunctionMemory type CacheabilityCache hint
> * *ioremap()* *Device**n/a*   *n/a*
> * ioremap_cache() Normal  Writeback   Read allocate
> * ioremap_wc()Normal  Non-cacheable   n/a
> * ioremap_wt()Normal  Non-cacheable   n/a
> *
> * All device mappings have the following properties:
> * - no access speculation
> * - no repetition (eg, on return from an exception)
> * - number, order and size of accesses are maintained
> * - unaligned accesses are "unpredictable"
> * - writes may be delayed before they hit the endpoint device
> 
> On our platforms, we know that to ensure the writes have hit the endpoint
> device (aka DMA3 registers), a read have to be done before.
> And that's 

[PATCH v2] wifi: mac80211: Avoid address calculations via out of bounds array indexing

2024-05-17 Thread Kenton Groombridge
req->n_channels must be set before req->channels[] can be used.

This patch fixes one of the issues encountered in [1].

[   83.964252] [ cut here ]
[   83.964255] UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:364:4
[   83.964258] index 0 is out of range for type 'struct ieee80211_channel *[]'
[   83.964260] CPU: 0 PID: 1695 Comm: iwd Tainted: G   OT 
6.8.9-gentoo-hardened1 #1
[   83.964262] Hardware name: System76 Pangolin/Pangolin, BIOS 
ARB928_V00.01_T0025ASY1_ms 04/20/2023
[   83.964264] Call Trace:
[   83.964267]  
[   83.964269]  dump_stack_lvl+0x3f/0xc0
[   83.964274]  __ubsan_handle_out_of_bounds+0xec/0x110
[   83.964278]  ieee80211_prep_hw_scan+0x2db/0x4b0
[   83.964281]  __ieee80211_start_scan+0x601/0x990
[   83.964284]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964287]  ? cfg80211_scan+0x149/0x250
[   83.964291]  nl80211_trigger_scan+0x874/0x980
[   83.964295]  genl_family_rcv_msg_doit+0xe8/0x160
[   83.964298]  genl_rcv_msg+0x240/0x270
[   83.964301]  ? __cfi_nl80211_trigger_scan+0x10/0x10
[   83.964302]  ? __cfi_nl80211_post_doit+0x10/0x10
[   83.964304]  ? __cfi_nl80211_pre_doit+0x10/0x10
[   83.964307]  ? __cfi_genl_rcv_msg+0x10/0x10
[   83.964309]  netlink_rcv_skb+0x102/0x130
[   83.964312]  genl_rcv+0x23/0x40
[   83.964314]  netlink_unicast+0x23b/0x340
[   83.964316]  netlink_sendmsg+0x3a9/0x450
[   83.964319]  __sys_sendto+0x3ae/0x3c0
[   83.964324]  __x64_sys_sendto+0x21/0x40
[   83.964326]  do_syscall_64+0x90/0x150
[   83.964329]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964331]  ? syscall_exit_work+0xc2/0xf0
[   83.964333]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964335]  ? syscall_exit_to_user_mode+0x74/0xa0
[   83.964337]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964339]  ? do_syscall_64+0x9c/0x150
[   83.964340]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964342]  ? syscall_exit_to_user_mode+0x74/0xa0
[   83.964344]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964346]  ? do_syscall_64+0x9c/0x150
[   83.964347]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964349]  ? do_syscall_64+0x9c/0x150
[   83.964351]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964353]  ? syscall_exit_work+0xc2/0xf0
[   83.964354]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964356]  ? syscall_exit_to_user_mode+0x74/0xa0
[   83.964358]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964359]  ? do_syscall_64+0x9c/0x150
[   83.964361]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964362]  ? do_user_addr_fault+0x488/0x620
[   83.964366]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964367]  ? srso_alias_return_thunk+0x5/0xfbef5
[   83.964369]  entry_SYSCALL_64_after_hwframe+0x55/0x5d
[   83.964372] RIP: 0033:0x6200808578d7
[   83.964374] Code: 00 00 90 f3 0f 1e fa 41 56 55 41 89 ce 48 83 ec 28 80 3d 
7b f7 0d 00 00 74 29 45 31 c9 45 31 c0 41 89 ca b8 2c 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 71 48 83 c4 28 5d 41 5e c3 66 0f 1f 84 00 00
[   83.964375] RSP: 002b:730c4e821530 EFLAGS: 0246 ORIG_RAX: 
002c
[   83.964378] RAX: ffda RBX: 06dbc456c570 RCX: 6200808578d7
[   83.964379] RDX: 005c RSI: 06dbc45884f0 RDI: 0004
[   83.964381] RBP: 0004 R08:  R09: 
[   83.964382] R10:  R11: 0246 R12: 06dbc456c480
[   83.964383] R13: 06dbc456c450 R14:  R15: 06dbc456c610
[   83.964386]  
[   83.964386] ---[ end trace ]---

[1] https://bugzilla.kernel.org/show_bug.cgi?id=218810

v1->v2:
- Drop changes in cfg80211 as requested by Johannes

Co-authored-by: Kees Cook 
Signed-off-by: Kenton Groombridge 
---
 net/mac80211/scan.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index 73850312580f..b88e99c211ff 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -358,7 +358,8 @@ static bool ieee80211_prep_hw_scan(struct 
ieee80211_sub_if_data *sdata)
struct cfg80211_scan_request *req;
struct cfg80211_chan_def chandef;
u8 bands_used = 0;
-   int i, ielen, n_chans;
+   int i, ielen;
+   u32 *n_chans;
u32 flags = 0;
 
req = rcu_dereference_protected(local->scan_req,
@@ -368,34 +369,34 @@ static bool ieee80211_prep_hw_scan(struct 
ieee80211_sub_if_data *sdata)
return false;
 
if (ieee80211_hw_check(>hw, SINGLE_SCAN_ON_ALL_BANDS)) {
+   local->hw_scan_req->req.n_channels = req->n_channels;
+
for (i = 0; i < req->n_channels; i++) {
local->hw_scan_req->req.channels[i] = req->channels[i];
bands_used |= BIT(req->channels[i]->band);
}
-
-   n_chans = req->n_channels;
} else {
do {
if (local->hw_scan_band == NUM_NL80211_BANDS)
return false;
 
-   n_chans = 0;

Re: [PATCH 05/12] dmaengine: Add STM32 DMA3 support

2024-05-17 Thread Amelie Delaunay

On 5/16/24 19:09, Frank Li wrote:

On Thu, May 16, 2024 at 05:25:58PM +0200, Amelie Delaunay wrote:

On 5/15/24 20:56, Frank Li wrote:

On Tue, Apr 23, 2024 at 02:32:55PM +0200, Amelie Delaunay wrote:

STM32 DMA3 driver supports the 3 hardware configurations of the STM32 DMA3
controller:

...

+   writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id));
+   writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id));
+
+   /* Clear any pending interrupts */
+   csr = readl_relaxed(ddata->base + STM32_DMA3_CSR(id));
+   if (csr & CSR_ALL_F)
+   writel_relaxed(csr, ddata->base + STM32_DMA3_CFCR(id));
+
+   stm32_dma3_chan_dump_reg(chan);
+
+   ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(id));
+   writel_relaxed(ccr | CCR_EN, ddata->base + STM32_DMA3_CCR(id));


This one should use writel instead of writel_relaxed because it need
dma_wmb() as barrier for preious write complete.

Frank



ddata->base is Device memory type thanks to ioremap() use, so it is strongly
ordered and non-cacheable.
DMA3 is outside CPU cluster, its registers are accessible through AHB bus.
dma_wmb() (in case of writel instead of writel_relaxed) is useless in that
case: it won't ensure the propagation on the bus is complete, and it will
have impacts on the system.
That's why CCR register is written once,  then it is read before CCR_EN is
set and being written again, with _relaxed(), because registers are behind a
bus, and ioremapped with Device memory type which ensures it is strongly
ordered and non-cacheable.


regardless memory map, writel_relaxed() just make sure io write and read is
orderred, not necessary order with other memory access. only readl and
writel make sure order with other memory read/write.

1. Write src_addr to descriptor
2. dma_wmb()
3. Write "ready" to descriptor
4. enable channel or doorbell by write a register.

if 4 use writel_relaxe(). because 3 write to DDR, which difference place of
mmio, 4 may happen before 3.  Your can refer axi order model.

4 have to use ONE writel(), to make sure 3 already write to DDR.

You need use at least one writel() to make sure all nornmal memory finish.



+writel_relaxed(chan->swdesc->ccr, ddata->base + STM32_DMA3_CCR(id));
+writel_relaxed(hwdesc->ctr1, ddata->base + STM32_DMA3_CTR1(id));
+writel_relaxed(hwdesc->ctr2, ddata->base + STM32_DMA3_CTR2(id));
+writel_relaxed(hwdesc->cbr1, ddata->base + STM32_DMA3_CBR1(id));
+writel_relaxed(hwdesc->csar, ddata->base + STM32_DMA3_CSAR(id));
+writel_relaxed(hwdesc->cdar, ddata->base + STM32_DMA3_CDAR(id));
+writel_relaxed(hwdesc->cllr, ddata->base + STM32_DMA3_CLLR(id));

These writel_relaxed() are from descriptors to DMA3 registers 
(descriptors being prepared "a long time ago" during _prep_).
As I said previously, DMA3 registers are outside CPU cluster, accessible 
through AHB bus, and ddata->base to address registers is ioremapped as 
Device memory type, non-cacheable and strongly ordered.


arch/arm/include/asm/io.h:
/*
* ioremap() and friends.
*
* ioremap() takes a resource address, and size.  Due to the ARM memory
* types, it is important to use the correct ioremap() function as each
* mapping has specific properties.
*
* Function  Memory type CacheabilityCache hint
* *ioremap()*   *Device**n/a*   *n/a*
* ioremap_cache()   Normal  Writeback   Read allocate
* ioremap_wc()  Normal  Non-cacheable   n/a
* ioremap_wt()  Normal  Non-cacheable   n/a
*
* All device mappings have the following properties:
* - no access speculation
* - no repetition (eg, on return from an exception)
* - number, order and size of accesses are maintained
* - unaligned accesses are "unpredictable"
* - writes may be delayed before they hit the endpoint device

On our platforms, we know that to ensure the writes have hit the 
endpoint device (aka DMA3 registers), a read have to be done before.

And that's what is done before enabling the channel:

+ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(id));
+writel_relaxed(ccr | CCR_EN, ddata->base + STM32_DMA3_CCR(id));

If there was an issue in this part of the code, it means channel would 
be started while it is wrongly programmed. In that case, DMA3 would 
raise a User Setting Error interrupt and disable the channel. User 
Setting Error is managed in this driver 
(USEF/stm32_dma3_check_user_setting()). And we never had reached a 
situation.





+
+   chan->dma_status = DMA_IN_PROGRESS;
+
+   dev_dbg(chan2dev(chan), "vchan %pK: started\n", >vchan);
+}
+
+static int stm32_dma3_chan_suspend(struct stm32_dma3_chan *chan, bool susp)
+{
+   struct stm32_dma3_ddata *ddata = to_stm32_dma3_ddata(chan);
+   u32 csr, ccr = readl_relaxed(ddata->base + STM32_DMA3_CCR(chan->id)) & 
~CCR_EN;
+   int ret = 0;
+
+   if (susp)
+   ccr |= CCR_SUSP;
+   else
+   ccr &= ~CCR_SUSP;
+
+

Re: [PATCH] ntp: remove accidental integer wrap-around

2024-05-17 Thread Thomas Gleixner
On Thu, May 16 2024 at 16:40, Justin Stitt wrote:
> On Tue, May 14, 2024 at 3:38 AM Thomas Gleixner  wrote:
>> So how can 0xf42400 + 50/1000 overflow in the first place?
>>
>> It can't unless time_maxerror is somehow initialized to a bogus
>> value and indeed it is:
>>
>> process_adjtimex_modes()
>> 
>> if (txc->modes & ADJ_MAXERROR)
>> time_maxerror = txc->maxerror;
>>
>> So that wants to be fixed and not the symptom.
>
> Isn't this usually supplied from the user and can be some pretty
> random stuff?

Sure it comes from user space and can contain random nonsense as
syzkaller demonstrated.

> Are you suggesting we update timekeeping_validate_timex() to include a
> check to limit the maxerror field to (NTP_PHASE_LIMIT-(MAXFREQ /
> NSEC_PER_USEC))? It seems like we should handle the overflow case
> where it happens: in second_overflow().
>
> The clear intent of the existing code was to saturate at
> NTP_PHASE_LIMIT, they just did it in a way where the check itself
> triggers overflow sanitizers.

The clear intent of the code is to do saturation of a bound value.

Clearly the user space interface fails to validate the input to be in a
sane range and that makes you go and prevent the resulting overflow at
the usage site. Seriously?

Obviously the sanitizer detects the stupid in second_overflow(), but
that does not mean that the proper solution is to add overflow
protection to that code.

Tools are good to pin-point symptoms, but they are by definition
patently bad in root cause analysis. Otherwise we could just let the
tool write the "fix".

The obvious first question in such a case is to ask _WHY_ does
time_maxerror have a bogus value, which clearly cannot be achieved from
regular operation.

Once you figured out that the only way to set time_maxerror to a bogus
value is the user space interface the obvious follow up question is
whether such a value has to be considered as valid or not.

As it is obviously invalid the logical consequence is to add a sanity
check and reject that nonsense at that boundary, no?

Thanks,

tglx



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-17 Thread Jonas Oberhauser




Am 5/8/2024 um 10:07 PM schrieb Linus Torvalds:

And no, the answer is ABSOLUTELY NOT to add cognitive load on kernel
developers by adding yet more random helper types and/or functions.



Just to show an option without "more types and helper functions", one 
could also instead add a coverage requirement:


Every arithmetic operation should either:
- have a test case where the wrap around happens, or
- have a static analyser say that overflow can not happen, or
- have a static analyser say that overflow is fine (e.g., your a+b < a case)

Then the answer to safe wrap situations isn't to make the kernel code 
less readable, but to have a component-level test that shows that the 
behavior on overflow (in at least one case :)) ) is what the developer 
expected.


For static analysis to prove that overflow can not happen, one sometimes 
would need to add BUG_ON() assertions to let the analyser know the 
assumptions on surrounding code, which has its own benefits.



static inline u32 __item_offset(u32 val)
{
BUG_ON(val > INT_MAX / ITEM_SIZE_PER_UNIT);
return val * ITEM_SIZE_PER_UNIT;
}


Obviously, the effort involved is still high. Maybe if someone as a pet 
project proves first that something in this direction is actually worth 
the effort (by uncovering a heap of bugs), one could offer this kind of 
check as an opt-in.



Best wishes,

  jonas




Re: [PATCH 1/1] wifi: mac80211: Avoid address calculations via out of bounds array indexing

2024-05-17 Thread Johannes Berg
On Thu, 2024-05-16 at 20:23 -0400, Kenton Groombridge wrote:
> req->n_channels must be set before req->channels[] can be used.
> Additionally, memory addresses after the "channels" array need to be
> calculated from the allocation base ("request") instead of the first
> "out of bounds" index of "channels" to avoid a runtime bounds check
> warning.

Thanks. Can you please drop the cfg80211 parts from this to match the
subject, the code there is broken in other ways too, I have a fix for
all of that:
https://patchwork.kernel.org/project/linux-wireless/patch/20240510113738.4190692ef4ee.I0cb19188be17a8abd029805e3373c0ac214c@changeid/

johannes




Re: Extending Linux' Coverity model and also cover aarch64

2024-05-17 Thread Manthey, Norbert
On Thu, 2024-05-16 at 12:20 -0700, Kees Cook wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> 
> 
> 
> On Thu, May 16, 2024 at 03:28:16PM +, Manthey, Norbert wrote:
> > we published an extension for the Coverity model that is used by
> > the
> > CoverityScan setup for the Linux kernel [1]. We have been using
> > this
> > extension to analyze the 6.1 kernel branch, and reported some fixes
> > to
> > the upstream code base that are based on this model [2]. Feel free
> > to
> > merge the pull request, and update the model in the CoverityScan
> > setup.
> > We do not have access to that project to perform these updates
> > ourselves.
> 
> Thanks for this! I'll get it loaded into the Linux-Next scanner.

Nice, thanks!

> 
> > To increase the analysis coverage to aarch64, we analyzed a x86 and
> > a
> > aarch64 configuration. The increased coverage is achieved by using
> > re-
> > configuration and cross-compilation during the analysis build. If
> > you
> > are interested in this setup we can share the Dockerfile and script
> > we
> > used for this process.
> 
> We've only got access to the free Coverity scanner, but it would be
> nice
> to see if there was anything specific to arm64.

Yes, I understand. Can you show how that free scanner is used? We
tweaked the command we fed into the "cov-build" tool. This tool should
be part of the scanner (if I remember that correctly).

> 
> > To prevent regressions in backports to LTS kernels, we wondered
> > whether
> > the community is interested in setting up CoverityScan projects for
> > older kernel releases. Would such an extension be useful to show
> > new
> > defects in addition to the current release testing?
> 
> The only one we (lightly) manage right now is the linux-next scanner.
> If
> other folks want to host scanners for -stable kernels, that would be
> interesting, yes.

Can you share explain or share pointers to how the current setup works?
If I understand that better, we can think about how to process the
other kernels.

Best,
Norbert

> 
> -Kees
> 
> --
> Kees Cook




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597