Re: [PATCH] libfs: fix accidental overflow in offset calculation
On Fri, May 10, 2024 at 03:26:08AM +, Justin Stitt wrote: > This feels like a case of accidental correctness. You demonstrated that > even with overflow we end up going down a control path that returns an > error code so all is good. No. It's about a very simple arithmetical fact: the smallest number that wraps to 0 is 2^N, which is more than twice the maximal signed N-bit value. So wraparound on adding a signed N-bit to non-negative signed N-bit will always end up with negative result. That's *NOT* a hard math. Really. As for the rest... SEEK_CUR semantics is "seek to current position + offset"; just about any ->llseek() instance will have that shape - calculate the position we want to get to, then forget about the difference between SEEK_SET and SEEK_CUR. So noticing that wraparound ends with negative is enough - we reject straight SEEK_SET to negatives anyway, so no extra logics is needed. > However, I think finding the solution > shouldn't require as much mental gymnastics. We clearly don't want our > file offsets to wraparound and a plain-and-simple check for that lets > readers of the code understand this. No comments that would be suitable for any kind of polite company.
Re: [PATCH] libfs: fix accidental overflow in offset calculation
Hi, On Fri, May 10, 2024 at 02:04:51AM +0100, Al Viro wrote: > On Fri, May 10, 2024 at 01:49:06AM +0100, Al Viro wrote: > > On Fri, May 10, 2024 at 12:35:51AM +, Justin Stitt wrote: > > > @@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t > > > offset, int whence) > > > struct dentry *dentry = file->f_path.dentry; > > > switch (whence) { > > > case 1: > > > - offset += file->f_pos; > > > + /* cannot represent offset with loff_t */ > > > + if (check_add_overflow(offset, file->f_pos, )) > > > + return -EOVERFLOW; > > > > Instead of -EINVAL it correctly returns in such cases? Why? > > We have file->f_pos in range 0..LLONG_MAX. We are adding a value in > range LLONG_MIN..LLONG_MAX. The sum (in $\Bbb Z$) cannot be below > LLONG_MIN or above 2 * LLONG_MAX, so if it can't be represented by loff_t, > it must have been in range LLONG_MAX + 1 .. 2 * LLONG_MAX. Result of > wraparound would be equal to that sum - 2 * LLONG_MAX - 2, which is going > to be in no greater than -2. We will run > fallthrough; > case 0: > if (offset >= 0) > break; > fallthrough; > default: > return -EINVAL; > and fail with -EINVAL. This feels like a case of accidental correctness. You demonstrated that even with overflow we end up going down a control path that returns an error code so all is good. However, I think finding the solution shouldn't require as much mental gymnastics. We clearly don't want our file offsets to wraparound and a plain-and-simple check for that lets readers of the code understand this. > > Could you explain why would -EOVERFLOW be preferable and why should we > engage in that bit of cargo cult? I noticed some patterns in fs/ regarding -EOVERFLOW and thought this was a good application of this particular error code. Some examples: read_write.c::do_sendfile() ... if (unlikely(pos + count > max)) { retval = -EOVERFLOW; if (pos >= max) goto fput_out; count = max - pos; } read_write.c::generic_copy_file_checks() ... /* Ensure offsets don't wrap. */ if (pos_in + count < pos_in || pos_out + count < pos_out) return -EOVERFLOW; ... amongst others. So to answer your question, I don't have strong feelings about what the error code should be; I was just attempting to match patterns I had seen within this section of the codebase when handling overflow/wraparound. If -EOVERFLOW is technically incorrect or if it is just bad style, I'm happy to send a new version swapping it over to -EINVAL Thanks Justin
Re: [PATCH] libfs: fix accidental overflow in offset calculation
On Fri, May 10, 2024 at 01:49:06AM +0100, Al Viro wrote: > On Fri, May 10, 2024 at 12:35:51AM +, Justin Stitt wrote: > > @@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t > > offset, int whence) > > struct dentry *dentry = file->f_path.dentry; > > switch (whence) { > > case 1: > > - offset += file->f_pos; > > + /* cannot represent offset with loff_t */ > > + if (check_add_overflow(offset, file->f_pos, )) > > + return -EOVERFLOW; > > Instead of -EINVAL it correctly returns in such cases? Why? We have file->f_pos in range 0..LLONG_MAX. We are adding a value in range LLONG_MIN..LLONG_MAX. The sum (in $\Bbb Z$) cannot be below LLONG_MIN or above 2 * LLONG_MAX, so if it can't be represented by loff_t, it must have been in range LLONG_MAX + 1 .. 2 * LLONG_MAX. Result of wraparound would be equal to that sum - 2 * LLONG_MAX - 2, which is going to be in no greater than -2. We will run fallthrough; case 0: if (offset >= 0) break; fallthrough; default: return -EINVAL; and fail with -EINVAL. Could you explain why would -EOVERFLOW be preferable and why should we engage in that bit of cargo cult?
Re: [PATCH] libfs: fix accidental overflow in offset calculation
On Fri, May 10, 2024 at 12:35:51AM +, Justin Stitt wrote: > @@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t offset, > int whence) > struct dentry *dentry = file->f_path.dentry; > switch (whence) { > case 1: > - offset += file->f_pos; > + /* cannot represent offset with loff_t */ > + if (check_add_overflow(offset, file->f_pos, )) > + return -EOVERFLOW; Instead of -EINVAL it correctly returns in such cases? Why?
[PATCH] libfs: fix accidental overflow in offset calculation
Running syzkaller with the newly reintroduced signed integer overflow sanitizer gives this report: [ 6008.464680] UBSAN: signed-integer-overflow in ../fs/libfs.c:149:11 [ 6008.468664] 9223372036854775807 + 16387 cannot be represented in type 'loff_t' (aka 'long long') [ 6008.474167] CPU: 1 PID: 1214 Comm: syz-executor.0 Not tainted 6.8.0-rc2-00041-gec7cb1052e44-dirty #15 [ 6008.479662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 6008.485276] Call Trace: [ 6008.486819] [ 6008.488258] dump_stack_lvl+0x93/0xd0 [ 6008.490535] handle_overflow+0x171/0x1b0 [ 6008.492957] dcache_dir_lseek+0x3bf/0x3d0 ... Use the check_add_overflow() helper to gracefully check for unintentional overflow causing wraparound in our offset calculations. Link: https://github.com/llvm/llvm-project/pull/82432 [1] Closes: https://github.com/KSPP/linux/issues/359 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt --- Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang. It was re-enabled in the kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow sanitizer"). Here's the syzkaller reproducer: | # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: | # SandboxArg:0 Leak:false NetInjection:false NetDevices:false | # NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false | # DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false | # IEEE802154:false Sysctl:false Swap:false UseTmpDir:false | # HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false | # Fault:false FaultCall:0 FaultNth:0}} | r0 = openat$sysfs(0xff9c, &(0x7f00)='/sys/kernel/tracing', 0x0, 0x0) | lseek(r0, 0x4003, 0x0) | lseek(r0, 0x7fff, 0x1) ... which was used against Kees' tree here (v6.8rc2): https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer ... with this config: https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4 --- fs/libfs.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index 3a6f2cb364f8..3fdc1aaddd45 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence) struct dentry *dentry = file->f_path.dentry; switch (whence) { case 1: - offset += file->f_pos; + /* cannot represent offset with loff_t */ + if (check_add_overflow(offset, file->f_pos, )) + return -EOVERFLOW; fallthrough; case 0: if (offset >= 0) @@ -422,7 +424,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) { switch (whence) { case SEEK_CUR: - offset += file->f_pos; + /* cannot represent offset with loff_t */ + if (check_add_overflow(offset, file->f_pos, )) + return -EOVERFLOW; fallthrough; case SEEK_SET: if (offset >= 0) --- base-commit: 0106679839f7c69632b3b9833c3268c316c0a9fc change-id: 20240509-b4-sio-libfs-67947244a4a3 Best regards, -- Justin Stitt
[PATCH v2] fs: fix unintentional arithmetic wraparound in offset calculation
When running syzkaller with the newly reintroduced signed integer overflow sanitizer we encounter this report: [ 67.995501] UBSAN: signed-integer-overflow in ../fs/read_write.c:91:10 [ 68.67] 9223372036854775807 + 4096 cannot be represented in type 'loff_t' (aka 'long long') [ 68.006266] CPU: 4 PID: 10851 Comm: syz-executor.5 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1 [ 68.012353] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 68.018983] Call Trace: [ 68.020803] [ 68.022540] dump_stack_lvl+0x93/0xd0 [ 68.025222] handle_overflow+0x171/0x1b0 [ 68.028053] generic_file_llseek_size+0x35b/0x380 amongst others: UBSAN: signed-integer-overflow in ../fs/read_write.c:1657:12 142606336 - -9223372036854775807 cannot be represented in type 'loff_t' (aka 'long long') ... UBSAN: signed-integer-overflow in ../fs/read_write.c:1666:11 9223372036854775807 - -9223231299366420479 cannot be represented in type 'loff_t' (aka 'long long') Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang. It was re-enabled in the kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow sanitizer"). Fix the accidental overflow in these position and offset calculations by checking for negative position values, using check_add_overflow() helpers and clamping values to expected ranges. Since @offset is later limited by @maxsize, we can proactively safeguard against exceeding that value (and by extension avoiding integer overflow): loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize) { if (offset < 0 && !unsigned_offsets(file)) return -EINVAL; if (offset > maxsize) return -EINVAL; ... Link: https://github.com/llvm/llvm-project/pull/82432 [1] Closes: https://github.com/KSPP/linux/issues/358 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt --- Changes in v2: - fix some more cases syzkaller found in read_write.c - use min over min_t as the types are the same - Link to v1: https://lore.kernel.org/r/20240509-b4-sio-read_write-v1-1-06bec2022...@google.com --- Here's the syzkaller reproducer: | # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: | # SandboxArg:0 Leak:false NetInjection:false NetDevices:false | # NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false | # DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false | # IEEE802154:false Sysctl:false Swap:false UseTmpDir:false | # HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false | # Fault:false FaultCall:0 FaultNth:0}} | r0 = openat$sysfs(0xff9c, &(0x7f00)='/sys/kernel/address_bits', 0x0, 0x98) | lseek(r0, 0x7fff, 0x2) ... which was used against Kees' tree here (v6.8rc2): https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer ... with this config: https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4 --- fs/read_write.c | 18 +++--- fs/remap_range.c | 12 ++-- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index d4c036e82b6c..d116e6e3eb3d 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -88,7 +88,7 @@ generic_file_llseek_size(struct file *file, loff_t offset, int whence, { switch (whence) { case SEEK_END: - offset += eof; + offset = min(offset, maxsize - eof) + eof; break; case SEEK_CUR: /* @@ -105,7 +105,8 @@ generic_file_llseek_size(struct file *file, loff_t offset, int whence, * like SEEK_SET. */ spin_lock(>f_lock); - offset = vfs_setpos(file, file->f_pos + offset, maxsize); + offset = vfs_setpos(file, min(file->f_pos, maxsize - offset) + + offset, maxsize); spin_unlock(>f_lock); return offset; case SEEK_DATA: @@ -1416,7 +1417,7 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); uint64_t count = *req_count; - loff_t size_in; + loff_t size_in, in_sum, out_sum; int ret; ret = generic_file_rw_checks(file_in, file_out); @@ -1450,8 +1451,8 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out)) return -ETXTBSY; - /* Ensure offsets don't wrap. */ - if (pos_in + count < pos_in || pos_out + count < pos_out
Re: [PATCH] fs: remove accidental overflow during wraparound check
On Thu, May 9, 2024 at 8:53 AM Jan Kara wrote: > > @@ -319,8 +320,12 @@ int vfs_fallocate(struct file *file, int mode, loff_t > > offset, loff_t len) > > if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode)) > > return -ENODEV; > > > > - /* Check for wrap through zero too */ > > - if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < > > 0)) > > + /* Check for wraparound */ > > + if (check_add_overflow(offset, len, )) > > + return -EFBIG; > > + > > + /* Now, check bounds */ > > + if (sum > inode->i_sb->s_maxbytes || sum < 0) > > return -EFBIG; > > But why do you check for sum < 0? We know from previous checks offset >= 0 > && len > 0 so unless we overflow, sum is guaranteed to be > 0. Fair enough. I suppose with the overflow check in place we can no longer have a sum less than zero there. If nothing else, it tells readers of this code what the domain of (offset+len) is. I don't mind sending a new version, though. > > Honza > -- > Jan Kara > SUSE Labs, CR
[PATCH] fs: fix unintentional arithmetic wraparound in offset calculation
When running syzkaller with the newly reintroduced signed integer overflow sanitizer we encounter this report: [ 67.991989] [ cut here ] [ 67.995501] UBSAN: signed-integer-overflow in ../fs/read_write.c:91:10 [ 68.67] 9223372036854775807 + 4096 cannot be represented in type 'loff_t' (aka 'long long') [ 68.006266] CPU: 4 PID: 10851 Comm: syz-executor.5 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1 [ 68.012353] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 68.018983] Call Trace: [ 68.020803] [ 68.022540] dump_stack_lvl+0x93/0xd0 [ 68.025222] handle_overflow+0x171/0x1b0 [ 68.028053] generic_file_llseek_size+0x35b/0x380 ... Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang. It was re-enabled in the kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow sanitizer"). Since @offset is later limited by @maxsize, we can proactively safeguard against exceeding that value and also dodge some accidental overflow (which may cause bad file access): loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize) { if (offset < 0 && !unsigned_offsets(file)) return -EINVAL; if (offset > maxsize) return -EINVAL; ... Link: https://github.com/llvm/llvm-project/pull/82432 [1] Closes: https://github.com/KSPP/linux/issues/358 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt --- Here's the syzkaller reproducer: | # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: | # SandboxArg:0 Leak:false NetInjection:false NetDevices:false | # NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false | # DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false | # IEEE802154:false Sysctl:false Swap:false UseTmpDir:false | # HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false | # Fault:false FaultCall:0 FaultNth:0}} | r0 = openat$sysfs(0xff9c, &(0x7f00)='/sys/kernel/address_bits', 0x0, 0x98) | lseek(r0, 0x7fff, 0x2) ... which was used against Kees' tree here (v6.8rc2): https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer ... with this config: https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4 --- fs/read_write.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/read_write.c b/fs/read_write.c index d4c036e82b6c..10c3eaa5ef55 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -88,7 +88,7 @@ generic_file_llseek_size(struct file *file, loff_t offset, int whence, { switch (whence) { case SEEK_END: - offset += eof; + offset = min_t(loff_t, offset, maxsize - eof) + eof; break; case SEEK_CUR: /* --- base-commit: 0106679839f7c69632b3b9833c3268c316c0a9fc change-id: 20240509-b4-sio-read_write-04a17d40620e Best regards, -- Justin Stitt
RE: [RFC] Mitigating unexpected arithmetic overflow
... > I think that would be a completely different area that might be worth > looking at: instrumenting implicit casts for "drops bits". I'm afraid > that it's just *so* common than we might not be able to do that > sanely. Things like: buf[0] = val; buf[1] = val >>= 8; buf[2] = val >>= 8; buf[3] = val >>= 8; for writing a value little-endian and potentially misaligned. Really doesn't want any annotation. I've also seen code like: buf[0] = (unsigned char)(val & 0xff); not only ugly by it got compiled to: val &= 0xff // for the & val &= 0xff // for the cast byte write to memory. Modern gcc doesn't do that, but... There are some spurious casts that drop bits. I found plenty of dubious min_t(u8/u16,...) examples. (Well they are dubious, some are just a lot more dubious than others.) The problem is that every one needs careful inspection just in case the strange behaviour is required like min_t(u8, val - 1, lo_lim - 1) which treats lo_lim of zero as 'not a limit' and I think was ok. A slow, concerted effort to remove min_t() calls wouldn't be a bad thing. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
RE: [RFC] Mitigating unexpected arithmetic overflow
... > Going the other way is similar: > > all_bits = low_bits + ((u64) high_bits << 16) << 16); > > and again, the compiler will recognize this idiom and do the right > thing (and if 'all_bits' is only 32-bit, the compiler will optimize > the high bit noise away). On a 32bit system the compiler might not do the expected thing. I had terrible trouble with the 32bit div_u64_u32() code I was playing with getting the compiler to do 'something sensible' for that. I just couldn't get it to stop generating two 64bit values (in two register pairs) and then oring them together. I didn't try using a union - that might work. On x64 the asm "A" (edx:eax) and "a" and "d" constraints will DTRT but force the values into edx:eax which is ok if you are doing divides. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [POC][RFC][PATCH 0/2] pstore/mm/x86: Add wildcard memmap to map pstore consistently
On Thu, 9 May 2024 23:24:20 +0300 Mike Rapoport wrote: > On Thu, May 09, 2024 at 01:31:22PM -0400, Steven Rostedt wrote: > > On Thu, 9 May 2024 00:00:23 -0400 > > Steven Rostedt wrote: > > > > > I tried this approach and it unfortunately picks a different physical > > > location every time :-( > > > > > > So it is either adding to e820 tables or we create a new way to > > > allocate memory at early boot up. > > > > > > > Hmm, now I'm testing it more and it always seems to end up in the same > > location. I'm not sure why it failed the first three times I tried it :-/ > > If the kernel and the command line were the same, they shouldn't have. > e820 translates to memblock the same way every boot and the early > allocations also do not change from boot to boot. > > Could it be that three runs that failed had some differences in the kernel > parameters? > I wonder if KASLR caused it or not. But when I first added it to another machine, it failed to get the same address on the second boot, but was fine after that. Could be just something with my setup. I'm going to backport this to 5.15 and test this on a Chromebook and see what happens there (as that's the motivation behind this work). -- Steve
Re: [POC][RFC][PATCH 0/2] pstore/mm/x86: Add wildcard memmap to map pstore consistently
On Thu, May 09, 2024 at 01:31:22PM -0400, Steven Rostedt wrote: > On Thu, 9 May 2024 00:00:23 -0400 > Steven Rostedt wrote: > > > I tried this approach and it unfortunately picks a different physical > > location every time :-( > > > > So it is either adding to e820 tables or we create a new way to > > allocate memory at early boot up. > > > > Hmm, now I'm testing it more and it always seems to end up in the same > location. I'm not sure why it failed the first three times I tried it :-/ If the kernel and the command line were the same, they shouldn't have. e820 translates to memblock the same way every boot and the early allocations also do not change from boot to boot. Could it be that three runs that failed had some differences in the kernel parameters? > -- Steve -- Sincerely yours, Mike.
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, May 09, 2024 at 12:15:40PM -0700, Linus Torvalds wrote: > On Thu, 9 May 2024 at 11:48, Al Viro wrote: > > > > FWIW, the thing that somewhat worries me about having a helper along > > the lines of combine_to_u64(low, high) is that > > foo->splat = combine_to_u64(something, something_else); > > would be inviting hard-to-catch brainos - > > foo->splat = combine_to_u64(something_else, something); > > Yeah, we'd have to be very clear about naming and ordering. So it > would probably have to be something like > > result = combine_to_u64_hi_lo(high, low); > > to be easy to use. > > The good news is that if you *do* get it wrong despite clear naming, > the resulting value will be so obviously wrong that it's generally a > "Duh!" thing if you do any testing what-so-ever. > > Of course, I say that as somebody who always points out that I haven't > tested my own patches at all, and they are "something like this, > perhaps?". > > But having "hi_lo" kind of naming would hopefully make it really > obvious even when just looking at the source code. Or something like result = to_high32(high) | to_low32(low); perhaps? ;-) Re amusing things found by grepping: unsafe_get_user(lo, &__c->sig[1], label); \ unsafe_get_user(hi, &__c->sig[0], label); \ __s->sig[0] = hi | (((long)lo) << 32); \ (compat.h, be64 unsafe_get_compat_sigset()) It is correct, actually, but here 'hi' is 'signals in range 0..31' and 'lo' - 'signals in range 32..63'. Introduced in fb05121fd6a2 "signal: Add unsafe_get_compat_sigset()", looks like nobody had read it carefully enough for a WTF moment - at least no replies along the lines of 'might be a good idea to use less confusing names' anywhere on lore...
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, 9 May 2024 at 11:48, Al Viro wrote: > > FWIW, the thing that somewhat worries me about having a helper along > the lines of combine_to_u64(low, high) is that > foo->splat = combine_to_u64(something, something_else); > would be inviting hard-to-catch brainos - > foo->splat = combine_to_u64(something_else, something); Yeah, we'd have to be very clear about naming and ordering. So it would probably have to be something like result = combine_to_u64_hi_lo(high, low); to be easy to use. The good news is that if you *do* get it wrong despite clear naming, the resulting value will be so obviously wrong that it's generally a "Duh!" thing if you do any testing what-so-ever. Of course, I say that as somebody who always points out that I haven't tested my own patches at all, and they are "something like this, perhaps?". But having "hi_lo" kind of naming would hopefully make it really obvious even when just looking at the source code. Linus
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, May 09, 2024 at 11:39:04AM -0700, Linus Torvalds wrote: > .. it might also actually be a good idea *IF* we were to have some > kind of "implicit cast drops bits" warning, in that the compiler for > that case wouldn't remove the upper bits calculation, but would > trigger a warning if they are non-zero. > > So there are actually potential advantages to just always apparently > doing the full 64-bit arithmetic. > > Without debug warnings, it's a no-op that the compiler will just skip. > And with some hypothetical debug flag, it would be a "you are now > losing the high bits of the time value when assigning the result to a > limited 32-bit time_t" warning. FWIW, the thing that somewhat worries me about having a helper along the lines of combine_to_u64(low, high) is that foo->splat = combine_to_u64(something, something_else); would be inviting hard-to-catch brainos - foo->splat = combine_to_u64(something_else, something); would be very hard to catch on RTFS, especially when you'd been staring at that code for a long time. Explicitly spelled out it would be obvious which goes into bits 0..31 and which in 32..64.
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, 9 May 2024 at 11:08, Linus Torvalds wrote: > > Any half-way decent compiler will end up optimizing away the shifts > and adds for the high bits because they see the assignment to > 'all_bits'. There's no point in generating high bits that just get > thrown away. .. it might also actually be a good idea *IF* we were to have some kind of "implicit cast drops bits" warning, in that the compiler for that case wouldn't remove the upper bits calculation, but would trigger a warning if they are non-zero. So there are actually potential advantages to just always apparently doing the full 64-bit arithmetic. Without debug warnings, it's a no-op that the compiler will just skip. And with some hypothetical debug flag, it would be a "you are now losing the high bits of the time value when assigning the result to a limited 32-bit time_t" warning. Linus
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, 9 May 2024 at 10:54, Al Viro wrote: > > On Thu, May 09, 2024 at 08:38:28AM -0700, Linus Torvalds wrote: > > > Going the other way is similar: > > > > all_bits = low_bits + ((u64) high_bits << 16) << 16); > > > > and again, the compiler will recognize this idiom and do the right > > thing (and if 'all_bits' is only 32-bit, the compiler will optimize > > the high bit noise away). > > Umm... That would make sense if it was > all_bits = low_bits + ((T) high_bits << 16) << 16); > with possibly 32bit T. But the way you wrote that (with u64) it's > pointless - u64 _can_ be shifted by 32 just fine. Casting to 'T' is probably a clearer option but doesn't work very well in a helper functions that may not know what the final type is. Any half-way decent compiler will end up optimizing away the shifts and adds for the high bits because they see the assignment to 'all_bits'. There's no point in generating high bits that just get thrown away. Linus
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, May 09, 2024 at 08:38:28AM -0700, Linus Torvalds wrote: > Going the other way is similar: > > all_bits = low_bits + ((u64) high_bits << 16) << 16); > > and again, the compiler will recognize this idiom and do the right > thing (and if 'all_bits' is only 32-bit, the compiler will optimize > the high bit noise away). Umm... That would make sense if it was all_bits = low_bits + ((T) high_bits << 16) << 16); with possibly 32bit T. But the way you wrote that (with u64) it's pointless - u64 _can_ be shifted by 32 just fine.
Re: [PATCH v2] perf/ring_buffer: Prefer struct_size over open coded arithmetic
Hi Kees and Christophe, First of all, thanks for the reviews, comments and advices. On Mon, May 06, 2024 at 09:23:15AM -0700, Kees Cook wrote: > On Sun, May 05, 2024 at 07:31:24PM +0200, Erick Archer wrote: > > On Sun, May 05, 2024 at 05:24:55PM +0200, Christophe JAILLET wrote: > > > Le 05/05/2024 à 16:15, Erick Archer a écrit : > > > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > > > > index 4013408ce012..080537eff69f 100644 > > > > --- a/kernel/events/ring_buffer.c > > > > +++ b/kernel/events/ring_buffer.c > > > > @@ -822,9 +822,7 @@ struct perf_buffer *rb_alloc(int nr_pages, long > > > > watermark, int cpu, int flags) > > > > unsigned long size; > > > > > > Hi, > > > > > > Should size be size_t? > > > > I'm sorry, but I don't have enough knowledge to answer this question. > > The "size" variable is used as a return value by struct_size and as > > a parameter to the order_base_2() and kzalloc_node() functions. > > For Linux, size_t and unsigned long are the same (currently). > Pedantically, yes, this should be size_t, but it's the same. Thanks for this clarification. I will change the type for the next version. > > > [...] > > > all_buf = vmalloc_user((nr_pages + 1) * PAGE_SIZE); > > > if (!all_buf) > > > goto fail_all_buf; > > > > > > rb->user_page = all_buf; > > > rb->data_pages[0] = all_buf + PAGE_SIZE; > > > if (nr_pages) { <--- here > > > rb->nr_pages = 1; <--- > > > rb->page_order = ilog2(nr_pages); > > > } > > [...] > > I think that we don't need to deal with the "nr_pages = 0" case > > since the flex array will always have a length of one. > > > > Kees, can you help us with this? > > Agh, this code hurt my head for a while. > > all_buf contains "nr_pages + 1" pages. all_buf gets attached to > rb->user_page, and then rb->data_pages[0] points to the second page in > all_buf... which means, I guess, that rb->data_pages does only have 1 > entry. > > However, the nr_pages == 0 case is weird. Currently, data_pages[0] will > still get set (which points ... off the end of all_buf). If we > unconditionally set rb->nr_pages to 1, we're changing the behavior. If > we _don't_ set rb->data_pages[0], we're changing the behavior, but I > think it's an invalid pointer anyway, so this is the safer change to > make. Thanks for explain things well. > I suspect the right replacement is: > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 4013408ce012..7d638ce76799 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -916,15 +916,11 @@ void rb_free(struct perf_buffer *rb) > struct perf_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int > flags) > { > struct perf_buffer *rb; > - unsigned long size; > void *all_buf; > int node; > > - size = sizeof(struct perf_buffer); > - size += sizeof(void *); > - > node = (cpu == -1) ? cpu : cpu_to_node(cpu); > - rb = kzalloc_node(size, GFP_KERNEL, node); > + rb = kzalloc_node(struct_size(rb, nr_pages, 1), GFP_KERNEL, node); > if (!rb) > goto fail; > > @@ -935,9 +931,9 @@ struct perf_buffer *rb_alloc(int nr_pages, long > watermark, int cpu, int flags) > goto fail_all_buf; > > rb->user_page = all_buf; > - rb->data_pages[0] = all_buf + PAGE_SIZE; > if (nr_pages) { > rb->nr_pages = 1; > + rb->data_pages[0] = all_buf + PAGE_SIZE; > rb->page_order = ilog2(nr_pages); > } > Ok, I'll do it like this for the next version. > > > Also, why does rb_alloc() take an "int" nr_pages? The only caller has an > unsigned long argument for nr_pages. Nothing checks for >INT_MAX that I > can find. Thanks for letting me know. I will take a look. > > -- Again, thanks, Erick > Kees Cook
Re: [POC][RFC][PATCH 0/2] pstore/mm/x86: Add wildcard memmap to map pstore consistently
On Thu, 9 May 2024 00:00:23 -0400 Steven Rostedt wrote: > I tried this approach and it unfortunately picks a different physical > location every time :-( > > So it is either adding to e820 tables or we create a new way to > allocate memory at early boot up. > Hmm, now I'm testing it more and it always seems to end up in the same location. I'm not sure why it failed the first three times I tried it :-/ -- Steve
Re: [PATCH] fs: remove accidental overflow during wraparound check
On Tue 07-05-24 23:17:57, Justin Stitt wrote: > Running syzkaller with the newly enabled signed integer overflow > sanitizer produces this report: > > [ 195.401651] [ cut here ] > [ 195.404808] UBSAN: signed-integer-overflow in ../fs/open.c:321:15 > [ 195.408739] 9223372036854775807 + 562984447377399 cannot be represented in > type 'loff_t' (aka 'long long') > [ 195.414683] CPU: 1 PID: 703 Comm: syz-executor.0 Not tainted > 6.8.0-rc2-00039-g14de58dbe653-dirty #11 > [ 195.420138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.16.3-debian-1.16.3-2 04/01/2014 > [ 195.425804] Call Trace: > [ 195.427360] > [ 195.428791] dump_stack_lvl+0x93/0xd0 > [ 195.431150] handle_overflow+0x171/0x1b0 > [ 195.433640] vfs_fallocate+0x459/0x4f0 Well, we compile the kernel with -fno-strict-overflow for a reason so I wouldn't consider this a bug. But check_add_overflow() is easier to digest since we don't have to worry about type details so I'm for this change. > @@ -319,8 +320,12 @@ int vfs_fallocate(struct file *file, int mode, loff_t > offset, loff_t len) > if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode)) > return -ENODEV; > > - /* Check for wrap through zero too */ > - if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0)) > + /* Check for wraparound */ > + if (check_add_overflow(offset, len, )) > + return -EFBIG; > + > + /* Now, check bounds */ > + if (sum > inode->i_sb->s_maxbytes || sum < 0) > return -EFBIG; But why do you check for sum < 0? We know from previous checks offset >= 0 && len > 0 so unless we overflow, sum is guaranteed to be > 0. Honza -- Jan Kara SUSE Labs, CR
Re: [RFC] Mitigating unexpected arithmetic overflow
On Thu, 9 May 2024 at 07:09, Theodore Ts'o wrote: > > struct ext2_super { >... >__u32time_lo; >__u32time_high; >... > } > > time_t now; > > sb->time_low = now; > sb->time_high = now >> 32; > > This is obviously (to a human) correct, Actually, it's not correct at all. If time_t is 32-bit, that "now >> 32" is broken due to how C shifts are defined. For a 32-bit type, only shifts of 0-31 bits are well-defined (due to hardware differences). > but because of stupid compiler > tricks, in order to silence compiler-level and ubsan complaints, this > got turned into: > > sb->time_low = now & 0x; > #if (SIZEOF_TIME_T > 4) > sb->time_high = (now >> 32) & EXT4_EPOCH_MASK; > #else > sb->time_high = 0; > #endi This is the wrong solution. But the solution to the undefined shift isn't some #ifdef. The idiomatic C solution is to do high_bits = (all_bits >> 16) >> 16; which works even if 'all_bits' is 32 bit, and the compiler will know this idiom, and turn it into just 0. Going the other way is similar: all_bits = low_bits + ((u64) high_bits << 16) << 16); and again, the compiler will recognize this idiom and do the right thing (and if 'all_bits' is only 32-bit, the compiler will optimize the high bit noise away). And yes, this is not great, but there you have it: C was designed to be a high-level assembler, and you see the effects of "this is how many hardware shifters work". The shift instructions on many (most?) architectures end up being limited to the word width. We actually have some helpers for this in : #define upper_32_bits(n) ((u32)(((n) >> 16) >> 16)) #define lower_32_bits(n) ((u32)((n) & 0x)) but we don't have that "combine 32 bits into 64 bits" helper. Maybe worth adding. Linus
Re: [RFC] Mitigating unexpected arithmetic overflow
On Wed, May 08, 2024 at 11:11:35PM -0700, Kees Cook wrote: > > I think it would be interesting in general to have some kind of > > warning for "implicit cast drops bits". > > > > I fear that we'd have an enormous about of them, and maybe they'd be > > unsolvable without making the code *much* uglier (and sometimes the > > fix might be to add an explicit cast to document intentionally dropped > > bits, but explicit casts have their own issues). Seapking of which, I recently had to work around an overactive compiler UBSAN which complained about this: struct ext2_super { ... __u32time_lo; __u32time_high; ... } time_t now; sb->time_low = now; sb->time_high = now >> 32; This is obviously (to a human) correct, but because of stupid compiler tricks, in order to silence compiler-level and ubsan complaints, this got turned into: sb->time_low = now & 0x; #if (SIZEOF_TIME_T > 4) sb->time_high = (now >> 32) & EXT4_EPOCH_MASK; #else sb->time_high = 0; #endif and in the opposite case, I was forced to write: #if (SIZEOF_TIME_T == 4) return *lo; #else return ((time_t)(*hi) << 32) | *lo; #endif ... and this made me very sad. Grumble - Ted
Re: [RFC] Mitigating unexpected arithmetic overflow
On Wed, May 08, 2024 at 04:47:25PM -0700, Linus Torvalds wrote: > For example, the most common case of overflow we've ever had has very > much been array indexing. Now, sometimes that has actually been actual > undefined behavior, because it's been overflow in signed variables, > and those are "easy" to find in the sense that you just say "no, can't > do that". UBSAN finds them, and that's good. Yes, the array bounds checking work has been going well. It's been about 5 years now, plucking out all the weird compiler behaviors, refactoring code to get rid of ambiguous patterns, etc. It's turtles and compiler extensions all the way down. It's been a real education on "this will be easy: the compiler already has all the information". Only to find, no, it's not, because no one has tried to make any of it work together sanely in 50 years. But we're finally at the tail end of it, with clear and unambiguous semantics and mitigations now. > So I'd suggest seeing if we can catch the cases that *technically* > aren't UB, but match dangerous overflow situations. IOW, start with > very limited and very targeted things to look out for. Yes, exactly. What I'd said in the RFC was that from a terminology perspective I don't want to focus on "undefined behavior", this means a very specific thing to the compiler folks. I'm focused on what _unexpected_ things actually happens in the real world that lead to exploitable flaws. > So *that* I feel could be something where you can warn without a ton > of compiler smarts at all. If you see an *implicit* cast to unsigned > and then the subsequent operations wraps around, it's probably worth > being a lot more worried about. I agree, implicit casts are worth exploring. It's a more limited set of behaviors, but I remain skeptical about the utility of catching them since the visibility of those kinds of promotions can quickly go out of scope. > And basically try to catch cases where arithmetic is explicitly used > for a size calculation: things like 'kmalloc()' etc are the obvious > things. And yes, the result will then have an (implicit) cast to > 'size_t' as part of the calling convention. We've been doing this kind of static analysis of allocation size mistakes for a while, but it remains not very well introspected by the compiler, making it a long game of whack-a-mole. And generally speaking all the wrapping around has already happened way before it hits the allocator. We've mostly dealt with all the low hanging code pattern fruit in this area, but it could be a place to add type-based compiler smarts. > And, finally, simply pointer arithmetic. Again, this is *not* some > "omniscience required". This is something very obvious where the > compiler is *very* aware of it being pointer arithmetic, because > pointer arithmetic has that implicit (and important) "take size of > object into account". Yes, that is already in progress too. Now that we've got __counted_by for flex arrays, the next addition is __counted_by for pointers. Coverage there will continue to grow. There is, unfortunately, a rather lot of alloc-and-forget going on in the kernel where object sizes are just thrown away. We've done some refactoring in places to add it, but I suspect the need for that will become more common once we gain coverage for pointer offsets. > So again, if you see an integer expression that leads to pointer > arithmetic, at that point the overflow very much is relevant. Right, and we've already been doing all of these things. They do all share the common root cause, though: unexpected wrap-around. But I hear you: don't go after the root cause, find high-signal things that are close to it. > So what I object to - loudly - is some completely bogus argument of > "integer wraparound is always wrong". Just so it's not mistaken, I didn't say "always": I said "can be". This was never a proposal to outlaw all wrap-around. :P > But catching wrap-around in *specific* cases, if you can argue for why > it's wrong in *those* cases: that sounds fine. Right. I mean, that is basically what I proposed. It was just over a much larger class than it seems you'll tolerate. :P You're asking to keep the scope smaller, and aimed at places where we know it can go wrong. I still think this is backwards from what would be best coverage. For the highest level of flaw mitigation, we want to exclude that which is proven safe, not try to catch things that might be unsafe. But I do try to be pragmatic, and it doesn't sound like that approach will be accepted here. > So in *that* case, you actually have a much more interesting > situation. Not wrap-around, not overflow, but "implicit cast drops > significant bits". Yup! This is what I was talking about in the RFC: implicit integer truncation. It *is* very nasty, I agree. We can get started on this next then. I poked around at it earlier this year[1]. My plan was to tackle it after finishing signed integer overflows (which is well underway). > And yes, I do