Re: [PATCH] libfs: fix accidental overflow in offset calculation

2024-05-09 Thread Al Viro
On Fri, May 10, 2024 at 03:26:08AM +, Justin Stitt wrote:

> This feels like a case of accidental correctness. You demonstrated that
> even with overflow we end up going down a control path that returns an
> error code so all is good.

No.  It's about a very simple arithmetical fact: the smallest number that
wraps to 0 is 2^N, which is more than twice the maximal signed N-bit
value.  So wraparound on adding a signed N-bit to non-negative signed N-bit
will always end up with negative result.  That's *NOT* a hard math.  Really.

As for the rest... SEEK_CUR semantics is "seek to current position + offset";
just about any ->llseek() instance will have that shape - calculate the
position we want to get to, then forget about the difference between
SEEK_SET and SEEK_CUR.  So noticing that wraparound ends with negative
is enough - we reject straight SEEK_SET to negatives anyway, so no
extra logics is needed.

> However, I think finding the solution
> shouldn't require as much mental gymnastics. We clearly don't want our
> file offsets to wraparound and a plain-and-simple check for that lets
> readers of the code understand this.

No comments that would be suitable for any kind of polite company.



Re: [PATCH] libfs: fix accidental overflow in offset calculation

2024-05-09 Thread Justin Stitt
Hi,

On Fri, May 10, 2024 at 02:04:51AM +0100, Al Viro wrote:
> On Fri, May 10, 2024 at 01:49:06AM +0100, Al Viro wrote:
> > On Fri, May 10, 2024 at 12:35:51AM +, Justin Stitt wrote:
> > > @@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t 
> > > offset, int whence)
> > >   struct dentry *dentry = file->f_path.dentry;
> > >   switch (whence) {
> > >   case 1:
> > > - offset += file->f_pos;
> > > + /* cannot represent offset with loff_t */
> > > + if (check_add_overflow(offset, file->f_pos, ))
> > > + return -EOVERFLOW;
> > 
> > Instead of -EINVAL it correctly returns in such cases?  Why?
> 
> We have file->f_pos in range 0..LLONG_MAX.  We are adding a value in
> range LLONG_MIN..LLONG_MAX.  The sum (in $\Bbb Z$) cannot be below
> LLONG_MIN or above 2 * LLONG_MAX, so if it can't be represented by loff_t,
> it must have been in range LLONG_MAX + 1 .. 2 * LLONG_MAX.  Result of
> wraparound would be equal to that sum - 2 * LLONG_MAX - 2, which is going
> to be in no greater than -2.  We will run
>   fallthrough;
>   case 0:
>   if (offset >= 0)
>   break;
>   fallthrough;
>   default:
>   return -EINVAL;
> and fail with -EINVAL.

This feels like a case of accidental correctness. You demonstrated that
even with overflow we end up going down a control path that returns an
error code so all is good. However, I think finding the solution
shouldn't require as much mental gymnastics. We clearly don't want our
file offsets to wraparound and a plain-and-simple check for that lets
readers of the code understand this.

> 
> Could you explain why would -EOVERFLOW be preferable and why should we
> engage in that bit of cargo cult?

I noticed some patterns in fs/ regarding -EOVERFLOW and thought this was
a good application of this particular error code.

Some examples:

read_write.c::do_sendfile()
  ...
if (unlikely(pos + count > max)) {
retval = -EOVERFLOW;
if (pos >= max)
goto fput_out;
count = max - pos;
}

read_write.c::generic_copy_file_checks()
  ...
/* Ensure offsets don't wrap. */
if (pos_in + count < pos_in || pos_out + count < pos_out)
return -EOVERFLOW;

... amongst others.

So to answer your question, I don't have strong feelings about what the
error code should be; I was just attempting to match patterns I had seen
within this section of the codebase when handling overflow/wraparound.

If -EOVERFLOW is technically incorrect or if it is just bad style, I'm
happy to send a new version swapping it over to -EINVAL

Thanks
Justin



Re: [PATCH] libfs: fix accidental overflow in offset calculation

2024-05-09 Thread Al Viro
On Fri, May 10, 2024 at 01:49:06AM +0100, Al Viro wrote:
> On Fri, May 10, 2024 at 12:35:51AM +, Justin Stitt wrote:
> > @@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t 
> > offset, int whence)
> > struct dentry *dentry = file->f_path.dentry;
> > switch (whence) {
> > case 1:
> > -   offset += file->f_pos;
> > +   /* cannot represent offset with loff_t */
> > +   if (check_add_overflow(offset, file->f_pos, ))
> > +   return -EOVERFLOW;
> 
> Instead of -EINVAL it correctly returns in such cases?  Why?

We have file->f_pos in range 0..LLONG_MAX.  We are adding a value in
range LLONG_MIN..LLONG_MAX.  The sum (in $\Bbb Z$) cannot be below
LLONG_MIN or above 2 * LLONG_MAX, so if it can't be represented by loff_t,
it must have been in range LLONG_MAX + 1 .. 2 * LLONG_MAX.  Result of
wraparound would be equal to that sum - 2 * LLONG_MAX - 2, which is going
to be in no greater than -2.  We will run
fallthrough;
case 0:
if (offset >= 0)
break;
fallthrough;
default:
return -EINVAL;
and fail with -EINVAL.

Could you explain why would -EOVERFLOW be preferable and why should we
engage in that bit of cargo cult?



Re: [PATCH] libfs: fix accidental overflow in offset calculation

2024-05-09 Thread Al Viro
On Fri, May 10, 2024 at 12:35:51AM +, Justin Stitt wrote:
> @@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t offset, 
> int whence)
>   struct dentry *dentry = file->f_path.dentry;
>   switch (whence) {
>   case 1:
> - offset += file->f_pos;
> + /* cannot represent offset with loff_t */
> + if (check_add_overflow(offset, file->f_pos, ))
> + return -EOVERFLOW;

Instead of -EINVAL it correctly returns in such cases?  Why?



[PATCH] libfs: fix accidental overflow in offset calculation

2024-05-09 Thread Justin Stitt
Running syzkaller with the newly reintroduced signed integer overflow
sanitizer gives this report:

[ 6008.464680] UBSAN: signed-integer-overflow in ../fs/libfs.c:149:11
[ 6008.468664] 9223372036854775807 + 16387 cannot be represented in type 
'loff_t' (aka 'long long')
[ 6008.474167] CPU: 1 PID: 1214 Comm: syz-executor.0 Not tainted 
6.8.0-rc2-00041-gec7cb1052e44-dirty #15
[ 6008.479662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.16.3-debian-1.16.3-2 04/01/2014
[ 6008.485276] Call Trace:
[ 6008.486819]  
[ 6008.488258]  dump_stack_lvl+0x93/0xd0
[ 6008.490535]  handle_overflow+0x171/0x1b0
[ 6008.492957]  dcache_dir_lseek+0x3bf/0x3d0
...

Use the check_add_overflow() helper to gracefully check for
unintentional overflow causing wraparound in our offset calculations.

Link: https://github.com/llvm/llvm-project/pull/82432 [1]
Closes: https://github.com/KSPP/linux/issues/359
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt 
---

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang. It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").

Here's the syzkaller reproducer:

| # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox:
| # SandboxArg:0 Leak:false NetInjection:false NetDevices:false
| # NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false
| # DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false
| # IEEE802154:false Sysctl:false Swap:false UseTmpDir:false
| # HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false
| # Fault:false FaultCall:0 FaultNth:0}}
| r0 = openat$sysfs(0xff9c, 
&(0x7f00)='/sys/kernel/tracing', 0x0, 0x0)
| lseek(r0, 0x4003, 0x0)
| lseek(r0, 0x7fff, 0x1)

... which was used against Kees' tree here (v6.8rc2):
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer

... with this config:
https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4
---
 fs/libfs.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 3a6f2cb364f8..3fdc1aaddd45 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -147,7 +147,9 @@ loff_t dcache_dir_lseek(struct file *file, loff_t offset, 
int whence)
struct dentry *dentry = file->f_path.dentry;
switch (whence) {
case 1:
-   offset += file->f_pos;
+   /* cannot represent offset with loff_t */
+   if (check_add_overflow(offset, file->f_pos, ))
+   return -EOVERFLOW;
fallthrough;
case 0:
if (offset >= 0)
@@ -422,7 +424,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t 
offset, int whence)
 {
switch (whence) {
case SEEK_CUR:
-   offset += file->f_pos;
+   /* cannot represent offset with loff_t */
+   if (check_add_overflow(offset, file->f_pos, ))
+   return -EOVERFLOW;
fallthrough;
case SEEK_SET:
if (offset >= 0)

---
base-commit: 0106679839f7c69632b3b9833c3268c316c0a9fc
change-id: 20240509-b4-sio-libfs-67947244a4a3

Best regards,
--
Justin Stitt 




[PATCH v2] fs: fix unintentional arithmetic wraparound in offset calculation

2024-05-09 Thread Justin Stitt
When running syzkaller with the newly reintroduced signed integer
overflow sanitizer we encounter this report:

[   67.995501] UBSAN: signed-integer-overflow in ../fs/read_write.c:91:10
[   68.67] 9223372036854775807 + 4096 cannot be represented in type 
'loff_t' (aka 'long long')
[   68.006266] CPU: 4 PID: 10851 Comm: syz-executor.5 Not tainted 
6.8.0-rc2-00035-gb3ef86b5a957 #1
[   68.012353] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.16.3-debian-1.16.3-2 04/01/2014
[   68.018983] Call Trace:
[   68.020803]  
[   68.022540]  dump_stack_lvl+0x93/0xd0
[   68.025222]  handle_overflow+0x171/0x1b0
[   68.028053]  generic_file_llseek_size+0x35b/0x380

amongst others:
UBSAN: signed-integer-overflow in ../fs/read_write.c:1657:12
142606336 - -9223372036854775807 cannot be represented in type 'loff_t' (aka 
'long long')
...
UBSAN: signed-integer-overflow in ../fs/read_write.c:1666:11
9223372036854775807 - -9223231299366420479 cannot be represented in type 
'loff_t' (aka 'long long')

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang. It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").

Fix the accidental overflow in these position and offset calculations
by checking for negative position values, using check_add_overflow()
helpers and clamping values to expected ranges.

Since @offset is later limited by @maxsize, we can proactively safeguard
against exceeding that value (and by extension avoiding integer overflow):
loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize)
{
if (offset < 0 && !unsigned_offsets(file))
return -EINVAL;
if (offset > maxsize)
return -EINVAL;
...

Link: https://github.com/llvm/llvm-project/pull/82432 [1]
Closes: https://github.com/KSPP/linux/issues/358
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt 
---
Changes in v2:
- fix some more cases syzkaller found in read_write.c
- use min over min_t as the types are the same
- Link to v1: 
https://lore.kernel.org/r/20240509-b4-sio-read_write-v1-1-06bec2022...@google.com
---
Here's the syzkaller reproducer:
| # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox:
| # SandboxArg:0 Leak:false NetInjection:false NetDevices:false
| # NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false
| # DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false
| # IEEE802154:false Sysctl:false Swap:false UseTmpDir:false
| # HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false
| # Fault:false FaultCall:0 FaultNth:0}}
| r0 = openat$sysfs(0xff9c, 
&(0x7f00)='/sys/kernel/address_bits', 0x0, 0x98)
| lseek(r0, 0x7fff, 0x2)

... which was used against Kees' tree here (v6.8rc2):
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer

... with this config:
https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4
---
 fs/read_write.c  | 18 +++---
 fs/remap_range.c | 12 ++--
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index d4c036e82b6c..d116e6e3eb3d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -88,7 +88,7 @@ generic_file_llseek_size(struct file *file, loff_t offset, 
int whence,
 {
switch (whence) {
case SEEK_END:
-   offset += eof;
+   offset = min(offset, maxsize - eof) + eof;
break;
case SEEK_CUR:
/*
@@ -105,7 +105,8 @@ generic_file_llseek_size(struct file *file, loff_t offset, 
int whence,
 * like SEEK_SET.
 */
spin_lock(>f_lock);
-   offset = vfs_setpos(file, file->f_pos + offset, maxsize);
+   offset = vfs_setpos(file, min(file->f_pos, maxsize - offset) +
+ offset, maxsize);
spin_unlock(>f_lock);
return offset;
case SEEK_DATA:
@@ -1416,7 +1417,7 @@ static int generic_copy_file_checks(struct file *file_in, 
loff_t pos_in,
struct inode *inode_in = file_inode(file_in);
struct inode *inode_out = file_inode(file_out);
uint64_t count = *req_count;
-   loff_t size_in;
+   loff_t size_in, in_sum, out_sum;
int ret;
 
ret = generic_file_rw_checks(file_in, file_out);
@@ -1450,8 +1451,8 @@ static int generic_copy_file_checks(struct file *file_in, 
loff_t pos_in,
if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
return -ETXTBSY;
 
-   /* Ensure offsets don't wrap. */
-   if (pos_in + count < pos_in || pos_out + count < pos_out

Re: [PATCH] fs: remove accidental overflow during wraparound check

2024-05-09 Thread Justin Stitt
On Thu, May 9, 2024 at 8:53 AM Jan Kara  wrote:
> > @@ -319,8 +320,12 @@ int vfs_fallocate(struct file *file, int mode, loff_t 
> > offset, loff_t len)
> >   if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode))
> >   return -ENODEV;
> >
> > - /* Check for wrap through zero too */
> > - if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 
> > 0))
> > + /* Check for wraparound */
> > + if (check_add_overflow(offset, len, ))
> > + return -EFBIG;
> > +
> > + /* Now, check bounds */
> > + if (sum > inode->i_sb->s_maxbytes || sum < 0)
> >   return -EFBIG;
>
> But why do you check for sum < 0? We know from previous checks offset >= 0
> && len > 0 so unless we overflow, sum is guaranteed to be > 0.

Fair enough. I suppose with the overflow check in place we can no
longer have a sum less than zero there. If nothing else, it tells
readers of this code what the domain of (offset+len) is. I don't mind
sending a new version, though.

>
> Honza
> --
> Jan Kara 
> SUSE Labs, CR



[PATCH] fs: fix unintentional arithmetic wraparound in offset calculation

2024-05-09 Thread Justin Stitt
When running syzkaller with the newly reintroduced signed integer
overflow sanitizer we encounter this report:

[   67.991989] [ cut here ]
[   67.995501] UBSAN: signed-integer-overflow in ../fs/read_write.c:91:10
[   68.67] 9223372036854775807 + 4096 cannot be represented in type 
'loff_t' (aka 'long long')
[   68.006266] CPU: 4 PID: 10851 Comm: syz-executor.5 Not tainted 
6.8.0-rc2-00035-gb3ef86b5a957 #1
[   68.012353] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.16.3-debian-1.16.3-2 04/01/2014
[   68.018983] Call Trace:
[   68.020803]  
[   68.022540]  dump_stack_lvl+0x93/0xd0
[   68.025222]  handle_overflow+0x171/0x1b0
[   68.028053]  generic_file_llseek_size+0x35b/0x380
...

Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang. It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").

Since @offset is later limited by @maxsize, we can proactively safeguard
against exceeding that value and also dodge some accidental overflow
(which may cause bad file access):

loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize)
{
if (offset < 0 && !unsigned_offsets(file))
return -EINVAL;
if (offset > maxsize)
return -EINVAL;
...

Link: https://github.com/llvm/llvm-project/pull/82432 [1]
Closes: https://github.com/KSPP/linux/issues/358
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt 
---
Here's the syzkaller reproducer:
| # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox:
| # SandboxArg:0 Leak:false NetInjection:false NetDevices:false
| # NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false
| # DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false
| # IEEE802154:false Sysctl:false Swap:false UseTmpDir:false
| # HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false
| # Fault:false FaultCall:0 FaultNth:0}}
| r0 = openat$sysfs(0xff9c, 
&(0x7f00)='/sys/kernel/address_bits', 0x0, 0x98)
| lseek(r0, 0x7fff, 0x2)

... which was used against Kees' tree here (v6.8rc2):
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=wip/v6.9-rc2/unsigned-overflow-sanitizer

... with this config:
https://gist.github.com/JustinStitt/824976568b0f228ccbcbe49f3dee9bf4
---
 fs/read_write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index d4c036e82b6c..10c3eaa5ef55 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -88,7 +88,7 @@ generic_file_llseek_size(struct file *file, loff_t offset, 
int whence,
 {
switch (whence) {
case SEEK_END:
-   offset += eof;
+   offset = min_t(loff_t, offset, maxsize - eof) + eof;
break;
case SEEK_CUR:
/*

---
base-commit: 0106679839f7c69632b3b9833c3268c316c0a9fc
change-id: 20240509-b4-sio-read_write-04a17d40620e

Best regards,
--
Justin Stitt 




RE: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread David Laight
...
> I think that would be a completely different area that might be worth
> looking at: instrumenting implicit casts for "drops bits". I'm afraid
> that it's just *so* common than we might not be able to do that
> sanely.

Things like:
buf[0] = val;
buf[1] = val >>= 8;
buf[2] = val >>= 8;
buf[3] = val >>= 8;
for writing a value little-endian and potentially misaligned.
Really doesn't want any annotation.

I've also seen code like:
buf[0] = (unsigned char)(val & 0xff);
not only ugly by it got compiled to:
val &= 0xff // for the &
val &= 0xff // for the cast
byte write to memory.
Modern gcc doesn't do that, but...

There are some spurious casts that drop bits.
I found plenty of dubious min_t(u8/u16,...) examples.
(Well they are dubious, some are just a lot more dubious than others.)
The problem is that every one needs careful inspection just in case
the strange behaviour is required like min_t(u8, val - 1, lo_lim - 1)
which treats lo_lim of zero as 'not a limit' and I think was ok.

A slow, concerted effort to remove min_t() calls wouldn't be a bad thing.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


RE: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread David Laight
...
> Going the other way is similar:
> 
> all_bits = low_bits + ((u64) high_bits << 16) << 16);
> 
> and again, the compiler will recognize this idiom and do the right
> thing (and if 'all_bits' is only 32-bit, the compiler will optimize
> the high bit noise away).

On a 32bit system the compiler might not do the expected thing.

I had terrible trouble with the 32bit div_u64_u32() code I was
playing with getting the compiler to do 'something sensible' for
that.
I just couldn't get it to stop generating two 64bit values
(in two register pairs) and then oring them together.
I didn't try using a union - that might work.
On x64 the asm "A" (edx:eax) and "a" and "d" constraints will DTRT
but force the values into edx:eax which is ok if you are doing divides.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [POC][RFC][PATCH 0/2] pstore/mm/x86: Add wildcard memmap to map pstore consistently

2024-05-09 Thread Steven Rostedt
On Thu, 9 May 2024 23:24:20 +0300
Mike Rapoport  wrote:

> On Thu, May 09, 2024 at 01:31:22PM -0400, Steven Rostedt wrote:
> > On Thu, 9 May 2024 00:00:23 -0400
> > Steven Rostedt  wrote:
> >   
> > > I tried this approach and it unfortunately picks a different physical
> > > location every time :-(
> > > 
> > > So it is either adding to e820 tables or we create a new way to
> > > allocate memory at early boot up.
> > >   
> > 
> > Hmm, now I'm testing it more and it always seems to end up in the same
> > location. I'm not sure why it failed the first three times I tried it :-/  
> 
> If the kernel and the command line were the same, they shouldn't have. 
> e820 translates to memblock the same way every boot and the early
> allocations also do not change from boot to boot.
> 
> Could it be that three runs that failed had some differences in the kernel
> parameters?
> 

I wonder if KASLR caused it or not. But when I first added it to
another machine, it failed to get the same address on the second boot,
but was fine after that.

Could be just something with my setup. I'm going to backport this to
5.15 and test this on a Chromebook and see what happens there (as
that's the motivation behind this work).

-- Steve



Re: [POC][RFC][PATCH 0/2] pstore/mm/x86: Add wildcard memmap to map pstore consistently

2024-05-09 Thread Mike Rapoport
On Thu, May 09, 2024 at 01:31:22PM -0400, Steven Rostedt wrote:
> On Thu, 9 May 2024 00:00:23 -0400
> Steven Rostedt  wrote:
> 
> > I tried this approach and it unfortunately picks a different physical
> > location every time :-(
> > 
> > So it is either adding to e820 tables or we create a new way to
> > allocate memory at early boot up.
> > 
> 
> Hmm, now I'm testing it more and it always seems to end up in the same
> location. I'm not sure why it failed the first three times I tried it :-/

If the kernel and the command line were the same, they shouldn't have. 
e820 translates to memblock the same way every boot and the early
allocations also do not change from boot to boot.

Could it be that three runs that failed had some differences in the kernel
parameters?

> -- Steve

-- 
Sincerely yours,
Mike.



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Al Viro
On Thu, May 09, 2024 at 12:15:40PM -0700, Linus Torvalds wrote:
> On Thu, 9 May 2024 at 11:48, Al Viro  wrote:
> >
> > FWIW, the thing that somewhat worries me about having a helper along
> > the lines of combine_to_u64(low, high) is that
> > foo->splat = combine_to_u64(something, something_else);
> > would be inviting hard-to-catch brainos -
> > foo->splat = combine_to_u64(something_else, something);
> 
> Yeah, we'd have to be very clear about naming and ordering. So it
> would probably have to be something like
> 
> result = combine_to_u64_hi_lo(high, low);
> 
> to be easy to use.
> 
> The good news is that if you *do* get it wrong despite clear naming,
> the resulting value will be so obviously wrong that it's generally a
> "Duh!" thing if you do any testing what-so-ever.
> 
> Of course, I say that as somebody who always points out that I haven't
> tested my own patches at all, and they are "something like this,
> perhaps?".
> 
> But having "hi_lo" kind of naming would hopefully make it really
> obvious even when just looking at the source code.

Or something like
result = to_high32(high) | to_low32(low);
perhaps? ;-)

Re amusing things found by grepping:
unsafe_get_user(lo, &__c->sig[1], label);   \
unsafe_get_user(hi, &__c->sig[0], label);   \
__s->sig[0] = hi | (((long)lo) << 32);  \
(compat.h, be64 unsafe_get_compat_sigset())

It is correct, actually, but here 'hi' is 'signals in range 0..31' and
'lo' - 'signals in range 32..63'.  Introduced in fb05121fd6a2
"signal: Add unsafe_get_compat_sigset()", looks like nobody had read
it carefully enough for a WTF moment - at least no replies along the
lines of 'might be a good idea to use less confusing names' anywhere
on lore...



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Linus Torvalds
On Thu, 9 May 2024 at 11:48, Al Viro  wrote:
>
> FWIW, the thing that somewhat worries me about having a helper along
> the lines of combine_to_u64(low, high) is that
> foo->splat = combine_to_u64(something, something_else);
> would be inviting hard-to-catch brainos -
> foo->splat = combine_to_u64(something_else, something);

Yeah, we'd have to be very clear about naming and ordering. So it
would probably have to be something like

result = combine_to_u64_hi_lo(high, low);

to be easy to use.

The good news is that if you *do* get it wrong despite clear naming,
the resulting value will be so obviously wrong that it's generally a
"Duh!" thing if you do any testing what-so-ever.

Of course, I say that as somebody who always points out that I haven't
tested my own patches at all, and they are "something like this,
perhaps?".

But having "hi_lo" kind of naming would hopefully make it really
obvious even when just looking at the source code.

  Linus



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Al Viro
On Thu, May 09, 2024 at 11:39:04AM -0700, Linus Torvalds wrote:

> .. it might also actually be a good idea *IF* we were to have some
> kind of "implicit cast drops bits" warning, in that the compiler for
> that case wouldn't remove the upper bits calculation, but would
> trigger a warning if they are non-zero.
> 
> So there are actually potential advantages to just always apparently
> doing the full 64-bit arithmetic.
> 
> Without debug warnings, it's a no-op that the compiler will just skip.
> And with some hypothetical debug flag, it would be a "you are now
> losing the high bits of the time value when assigning the result to a
> limited 32-bit time_t" warning.

FWIW, the thing that somewhat worries me about having a helper along
the lines of combine_to_u64(low, high) is that
foo->splat = combine_to_u64(something, something_else);
would be inviting hard-to-catch brainos -
foo->splat = combine_to_u64(something_else, something);
would be very hard to catch on RTFS, especially when you'd been
staring at that code for a long time.  Explicitly spelled out
it would be obvious which goes into bits 0..31 and which in 32..64.



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Linus Torvalds
On Thu, 9 May 2024 at 11:08, Linus Torvalds
 wrote:
>
>  Any half-way decent compiler will end up optimizing away the shifts
> and adds for the high bits because they see the assignment to
> 'all_bits'. There's no point in generating high bits that just get
> thrown away.

.. it might also actually be a good idea *IF* we were to have some
kind of "implicit cast drops bits" warning, in that the compiler for
that case wouldn't remove the upper bits calculation, but would
trigger a warning if they are non-zero.

So there are actually potential advantages to just always apparently
doing the full 64-bit arithmetic.

Without debug warnings, it's a no-op that the compiler will just skip.
And with some hypothetical debug flag, it would be a "you are now
losing the high bits of the time value when assigning the result to a
limited 32-bit time_t" warning.

 Linus



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Linus Torvalds
On Thu, 9 May 2024 at 10:54, Al Viro  wrote:
>
> On Thu, May 09, 2024 at 08:38:28AM -0700, Linus Torvalds wrote:
>
> > Going the other way is similar:
> >
> > all_bits = low_bits + ((u64) high_bits << 16) << 16);
> >
> > and again, the compiler will recognize this idiom and do the right
> > thing (and if 'all_bits' is only 32-bit, the compiler will optimize
> > the high bit noise away).
>
> Umm...  That would make sense if it was
> all_bits = low_bits + ((T) high_bits << 16) << 16);
> with possibly 32bit T.  But the way you wrote that (with u64) it's
> pointless - u64 _can_ be shifted by 32 just fine.

Casting to 'T' is probably a clearer option but doesn't work very well
in a helper functions that may not know what the final type is.

 Any half-way decent compiler will end up optimizing away the shifts
and adds for the high bits because they see the assignment to
'all_bits'. There's no point in generating high bits that just get
thrown away.

Linus



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Al Viro
On Thu, May 09, 2024 at 08:38:28AM -0700, Linus Torvalds wrote:

> Going the other way is similar:
> 
> all_bits = low_bits + ((u64) high_bits << 16) << 16);
> 
> and again, the compiler will recognize this idiom and do the right
> thing (and if 'all_bits' is only 32-bit, the compiler will optimize
> the high bit noise away).

Umm...  That would make sense if it was
all_bits = low_bits + ((T) high_bits << 16) << 16);
with possibly 32bit T.  But the way you wrote that (with u64) it's
pointless - u64 _can_ be shifted by 32 just fine.



Re: [PATCH v2] perf/ring_buffer: Prefer struct_size over open coded arithmetic

2024-05-09 Thread Erick Archer
Hi Kees and Christophe,
First of all, thanks for the reviews, comments and advices.

On Mon, May 06, 2024 at 09:23:15AM -0700, Kees Cook wrote:
> On Sun, May 05, 2024 at 07:31:24PM +0200, Erick Archer wrote:
> > On Sun, May 05, 2024 at 05:24:55PM +0200, Christophe JAILLET wrote:
> > > Le 05/05/2024 à 16:15, Erick Archer a écrit :
> > > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> > > > index 4013408ce012..080537eff69f 100644
> > > > --- a/kernel/events/ring_buffer.c
> > > > +++ b/kernel/events/ring_buffer.c
> > > > @@ -822,9 +822,7 @@ struct perf_buffer *rb_alloc(int nr_pages, long 
> > > > watermark, int cpu, int flags)
> > > > unsigned long size;
> > > 
> > > Hi,
> > > 
> > > Should size be size_t?
> > 
> > I'm sorry, but I don't have enough knowledge to answer this question.
> > The "size" variable is used as a return value by struct_size and as
> > a parameter to the order_base_2() and kzalloc_node() functions.
> 
> For Linux, size_t and unsigned long are the same (currently).
> Pedantically, yes, this should be size_t, but it's the same.

Thanks for this clarification. I will change the type for the next
version.

> 
> > [...]
> > >   all_buf = vmalloc_user((nr_pages + 1) * PAGE_SIZE);
> > >   if (!all_buf)
> > >   goto fail_all_buf;
> > >
> > >   rb->user_page = all_buf;
> > >   rb->data_pages[0] = all_buf + PAGE_SIZE;
> > >   if (nr_pages) { <--- here
> > >   rb->nr_pages = 1;   <---
> > >   rb->page_order = ilog2(nr_pages);
> > >   }
> > [...]
> > I think that we don't need to deal with the "nr_pages = 0" case
> > since the flex array will always have a length of one.
> > 
> > Kees, can you help us with this?
> 
> Agh, this code hurt my head for a while.
> 
> all_buf contains "nr_pages + 1" pages. all_buf gets attached to
> rb->user_page, and then rb->data_pages[0] points to the second page in
> all_buf... which means, I guess, that rb->data_pages does only have 1
> entry.
> 
> However, the nr_pages == 0 case is weird. Currently, data_pages[0] will
> still get set (which points ... off the end of all_buf). If we
> unconditionally set rb->nr_pages to 1, we're changing the behavior. If
> we _don't_ set rb->data_pages[0], we're changing the behavior, but I
> think it's an invalid pointer anyway, so this is the safer change to
> make.

Thanks for explain things well.

> I suspect the right replacement is:

> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 4013408ce012..7d638ce76799 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -916,15 +916,11 @@ void rb_free(struct perf_buffer *rb)
>  struct perf_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int 
> flags)
>  {
>   struct perf_buffer *rb;
> - unsigned long size;
>   void *all_buf;
>   int node;
>  
> - size = sizeof(struct perf_buffer);
> - size += sizeof(void *);
> -
>   node = (cpu == -1) ? cpu : cpu_to_node(cpu);
> - rb = kzalloc_node(size, GFP_KERNEL, node);
> + rb = kzalloc_node(struct_size(rb, nr_pages, 1), GFP_KERNEL, node);
>   if (!rb)
>   goto fail;
>  
> @@ -935,9 +931,9 @@ struct perf_buffer *rb_alloc(int nr_pages, long 
> watermark, int cpu, int flags)
>   goto fail_all_buf;
>  
>   rb->user_page = all_buf;
> - rb->data_pages[0] = all_buf + PAGE_SIZE;
>   if (nr_pages) {
>   rb->nr_pages = 1;
> + rb->data_pages[0] = all_buf + PAGE_SIZE;
>   rb->page_order = ilog2(nr_pages);
>   }
>  
Ok, I'll do it like this for the next version.
> 
> 
> Also, why does rb_alloc() take an "int" nr_pages? The only caller has an
> unsigned long argument for nr_pages. Nothing checks for >INT_MAX that I
> can find.

Thanks for letting me know. I will take a look.
> 
> -- 
Again, thanks,
Erick

> Kees Cook



Re: [POC][RFC][PATCH 0/2] pstore/mm/x86: Add wildcard memmap to map pstore consistently

2024-05-09 Thread Steven Rostedt
On Thu, 9 May 2024 00:00:23 -0400
Steven Rostedt  wrote:

> I tried this approach and it unfortunately picks a different physical
> location every time :-(
> 
> So it is either adding to e820 tables or we create a new way to
> allocate memory at early boot up.
> 

Hmm, now I'm testing it more and it always seems to end up in the same
location. I'm not sure why it failed the first three times I tried it :-/

-- Steve



Re: [PATCH] fs: remove accidental overflow during wraparound check

2024-05-09 Thread Jan Kara
On Tue 07-05-24 23:17:57, Justin Stitt wrote:
> Running syzkaller with the newly enabled signed integer overflow
> sanitizer produces this report:
> 
> [  195.401651] [ cut here ]
> [  195.404808] UBSAN: signed-integer-overflow in ../fs/open.c:321:15
> [  195.408739] 9223372036854775807 + 562984447377399 cannot be represented in 
> type 'loff_t' (aka 'long long')
> [  195.414683] CPU: 1 PID: 703 Comm: syz-executor.0 Not tainted 
> 6.8.0-rc2-00039-g14de58dbe653-dirty #11
> [  195.420138] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.16.3-debian-1.16.3-2 04/01/2014
> [  195.425804] Call Trace:
> [  195.427360]  
> [  195.428791]  dump_stack_lvl+0x93/0xd0
> [  195.431150]  handle_overflow+0x171/0x1b0
> [  195.433640]  vfs_fallocate+0x459/0x4f0

Well, we compile the kernel with -fno-strict-overflow for a reason so I
wouldn't consider this a bug. But check_add_overflow() is easier to digest
since we don't have to worry about type details so I'm for this change.

> @@ -319,8 +320,12 @@ int vfs_fallocate(struct file *file, int mode, loff_t 
> offset, loff_t len)
>   if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode))
>   return -ENODEV;
>  
> - /* Check for wrap through zero too */
> - if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
> + /* Check for wraparound */
> + if (check_add_overflow(offset, len, ))
> + return -EFBIG;
> +
> + /* Now, check bounds */
> + if (sum > inode->i_sb->s_maxbytes || sum < 0)
>   return -EFBIG;

But why do you check for sum < 0? We know from previous checks offset >= 0
&& len > 0 so unless we overflow, sum is guaranteed to be > 0.

Honza
-- 
Jan Kara 
SUSE Labs, CR



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Linus Torvalds
On Thu, 9 May 2024 at 07:09, Theodore Ts'o  wrote:
>
> struct ext2_super {
>...
>__u32time_lo;
>__u32time_high;
>...
> }
>
> time_t  now;
>
> sb->time_low = now;
> sb->time_high = now >> 32;
>
> This is obviously (to a human) correct,

Actually, it's not correct at all.

If time_t is 32-bit, that "now >> 32" is broken due to how C shifts
are defined. For a 32-bit type, only shifts of 0-31 bits are
well-defined (due to hardware differences).

> but because of stupid compiler
> tricks, in order to silence compiler-level and ubsan complaints, this
> got turned into:
>
> sb->time_low = now & 0x;
> #if (SIZEOF_TIME_T > 4)
> sb->time_high = (now >> 32) & EXT4_EPOCH_MASK;
> #else
> sb->time_high = 0;
> #endi

This is the wrong solution.

But the solution to the undefined shift isn't some #ifdef.

The idiomatic C solution is to do

high_bits = (all_bits >> 16) >> 16;

which works even if 'all_bits' is 32 bit, and the compiler will know
this idiom, and turn it into just 0.

Going the other way is similar:

all_bits = low_bits + ((u64) high_bits << 16) << 16);

and again, the compiler will recognize this idiom and do the right
thing (and if 'all_bits' is only 32-bit, the compiler will optimize
the high bit noise away).

And yes, this is not great, but there you have it: C was designed to
be a high-level assembler, and you see the effects of "this is how
many hardware shifters work". The shift instructions on many (most?)
architectures end up being limited to the word width.

We actually have some helpers for this in :

  #define upper_32_bits(n) ((u32)(((n) >> 16) >> 16))
  #define lower_32_bits(n) ((u32)((n) & 0x))

but we don't have that "combine 32 bits into 64 bits" helper. Maybe
worth adding.

Linus



Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Theodore Ts'o
On Wed, May 08, 2024 at 11:11:35PM -0700, Kees Cook wrote:
> > I think it would be interesting in general to have some kind of
> > warning for "implicit cast drops bits".
> > 
> > I fear that we'd have an enormous about of them, and maybe they'd be
> > unsolvable without making the code *much* uglier (and sometimes the
> > fix might be to add an explicit cast to document intentionally dropped
> > bits, but explicit casts have their own issues).

Seapking of which, I recently had to work around an overactive
compiler UBSAN which complained about this:

struct ext2_super {
   ...
   __u32time_lo;
   __u32time_high;
   ...
}

time_t  now;

sb->time_low = now;
sb->time_high = now >> 32;

This is obviously (to a human) correct, but because of stupid compiler
tricks, in order to silence compiler-level and ubsan complaints, this
got turned into:


sb->time_low = now & 0x;
#if (SIZEOF_TIME_T > 4)
sb->time_high = (now >> 32) & EXT4_EPOCH_MASK;
#else
sb->time_high = 0;
#endif

and in the opposite case, I was forced to write:

#if (SIZEOF_TIME_T == 4)
return *lo;
#else
return ((time_t)(*hi) << 32) | *lo;
#endif

... and this made me very sad.  Grumble

- Ted




Re: [RFC] Mitigating unexpected arithmetic overflow

2024-05-09 Thread Kees Cook
On Wed, May 08, 2024 at 04:47:25PM -0700, Linus Torvalds wrote:
> For example, the most common case of overflow we've ever had has very
> much been array indexing. Now, sometimes that has actually been actual
> undefined behavior, because it's been overflow in signed variables,
> and those are "easy" to find in the sense that you just say "no, can't
> do that". UBSAN finds them, and that's good.

Yes, the array bounds checking work has been going well. It's been
about 5 years now, plucking out all the weird compiler behaviors,
refactoring code to get rid of ambiguous patterns, etc. It's turtles
and compiler extensions all the way down. It's been a real education on
"this will be easy: the compiler already has all the information". Only
to find, no, it's not, because no one has tried to make any of it work
together sanely in 50 years. But we're finally at the tail end of it,
with clear and unambiguous semantics and mitigations now.

> So I'd suggest seeing if we can catch the cases that *technically*
> aren't UB, but match dangerous overflow situations. IOW, start with
> very limited and very targeted things to look out for.

Yes, exactly. What I'd said in the RFC was that from a terminology
perspective I don't want to focus on "undefined behavior", this means
a very specific thing to the compiler folks. I'm focused on what
_unexpected_ things actually happens in the real world that lead to
exploitable flaws.

> So *that* I feel could be something where you can warn without a ton
> of compiler smarts at all. If you see an *implicit* cast to unsigned
> and then the subsequent operations wraps around, it's probably worth
> being a lot more worried about.

I agree, implicit casts are worth exploring. It's a more limited set of
behaviors, but I remain skeptical about the utility of catching them since
the visibility of those kinds of promotions can quickly go out of scope.

> And basically try to catch cases where arithmetic is explicitly used
> for a size calculation: things like 'kmalloc()' etc are the obvious
> things. And yes, the result will then have an (implicit) cast to
> 'size_t' as part of the calling convention.

We've been doing this kind of static analysis of allocation size
mistakes for a while, but it remains not very well introspected by the
compiler, making it a long game of whack-a-mole. And generally speaking
all the wrapping around has already happened way before it hits the
allocator. We've mostly dealt with all the low hanging code pattern fruit
in this area, but it could be a place to add type-based compiler smarts.

> And, finally, simply pointer arithmetic. Again, this is *not* some
> "omniscience required". This is something very obvious where the
> compiler is *very* aware of it being pointer arithmetic, because
> pointer arithmetic has that implicit (and important) "take size of
> object into account".

Yes, that is already in progress too. Now that we've got __counted_by for
flex arrays, the next addition is __counted_by for pointers. Coverage
there will continue to grow. There is, unfortunately, a rather lot of
alloc-and-forget going on in the kernel where object sizes are just
thrown away. We've done some refactoring in places to add it, but I
suspect the need for that will become more common once we gain coverage
for pointer offsets.

> So again, if you see an integer expression that leads to pointer
> arithmetic, at that point the overflow very much is relevant.

Right, and we've already been doing all of these things. They do all share
the common root cause, though: unexpected wrap-around. But I hear you:
don't go after the root cause, find high-signal things that are close to
it.

> So what I object to - loudly - is some completely bogus argument of
> "integer wraparound is always wrong".

Just so it's not mistaken, I didn't say "always": I said "can be". This
was never a proposal to outlaw all wrap-around. :P

> But catching wrap-around in *specific* cases, if you can argue for why
> it's wrong in *those* cases: that sounds fine.

Right. I mean, that is basically what I proposed. It was just over a
much larger class than it seems you'll tolerate. :P You're asking to keep
the scope smaller, and aimed at places where we know it can go wrong. I
still think this is backwards from what would be best coverage. For the
highest level of flaw mitigation, we want to exclude that which is proven
safe, not try to catch things that might be unsafe. But I do try to be
pragmatic, and it doesn't sound like that approach will be accepted here.

> So in *that* case, you actually have a much more interesting
> situation. Not wrap-around, not overflow, but "implicit cast drops
> significant bits".

Yup! This is what I was talking about in the RFC: implicit integer
truncation. It *is* very nasty, I agree. We can get started on this next
then. I poked around at it earlier this year[1]. My plan was to tackle
it after finishing signed integer overflows (which is well underway).

> And yes, I do