[cc'ing [email protected]; this coreutils thread can be found in <https://lists.gnu.org/r/coreutils/2025-12/threads.html#00055>.]

On 2025-12-20 00:51, Matteo Croce wrote:
This can be triggered with a huge file:

$ truncate -s $((2**63 - 1)) file1

$ ( dd bs=1M skip=$((2**43 - 2)) count=0 && cat ) < file1
0+0 records in
0+0 records out
0 bytes copied, 2,825e-05 s, 0,0 kB/s
cat: -: Invalid argument

$ dd if=file1 bs=1M skip=$((2**43 - 2))
dd: error reading 'file1': Invalid argument
1+0 records in
1+0 records out
1048576 bytes (1,0 MB, 1,0 MiB) copied, 0,103536 s, 10,1 MB/s

OK, but in bleeding-edge coreutils neither of these examples call copy_file_range. The diagnostics result from plain 'read' syscalls near TYPE_MAXIMUM (off_t). (dd never calls copy_file_range, and ironically the code in 'cat' that does call copy_file_range avoids the overflow itself, before invoking copy_file_range, and relies on plain 'read' to do the right thing near TYPE_MAXIMUM (off_t).) So these examples have nothing to do with copy_file_range.

You've found a Linux kernel bug that affects countless apps, and we can't reasonably expect app developers to patch all the apps to work around the bug. So the fix should be done in the kernel.

I looked at the kernel patch you suggested in <https://lore.kernel.org/linux-fsdevel/[email protected]/T/>. Unfortunately, I see two problems with it, the first minor, the second less so.

The minor problem is that the unpatched kernel code is merely incorrectly checking whether pos + count fits into loff_t. MAX_RW_COUNT should not be involved with the fix, as MAX_RW_COUNT is irrelevant to file offset range. Better would be to do correct overflow checks, with something like the attached patch (which I have not compiled or tested).

Second and more important, the patch doesn't fix the real bug which is that read(FD, BUF, SIZE) fails with -EINVAL if adding SIZE to the current file position would overflow off_t. That's wrong: the syscall should read whatever bytes are present (up to EOF), and then report the number of bytes read. We cannot fix this bug merely via something like the attached patch.

One possible fix for the second problem would be to change rw_verify_area's API to return the possibly-smaller number of bytes that can be read, and then modify its callers to do the right thing. ("correct" in the sense of "don't try to read past TYPE_MAXIMUM (off_t)".) Alternatively, we could fix rw_verify_area's callers to not try to read past TYPE_MAXIMUM (off_t), without changing the API.
diff --git a/fs/read_write.c b/fs/read_write.c
index 833bae068770..215d7cdbb1aa 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -459,13 +459,17 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
 	if (ppos) {
 		loff_t pos = *ppos;
 
-		if (unlikely(pos < 0)) {
 			if (!unsigned_offsets(file))
 				return -EINVAL;
-			if (count >= -pos) /* both values are in 0..LLONG_MAX */
-				return -EOVERFLOW;
-		} else if (unlikely((loff_t) (pos + count) < 0)) {
-			if (!unsigned_offsets(file))
+		}
+		if (unsigned_offsets(file)) {
+			if (check_add_overflow ((uoff_t) pos, count,
+						&(uoff_t) {0}))
+				return -EINVAL;
+		} else {
+			if (unlikely(pos < 0))
+				return -EINVAL;
+			if (check_add_overflow (pos, count, &(loff_t) {0}))
 				return -EINVAL;
 		}
 	}

Reply via email to