Re: [PATCH v2] fs: btrfs: fix out of bounds write
在 2024/6/19 07:11, Alex Shumsky 写道: Fix btrfs_read/read_and_truncate_page write out of bounds of destination buffer. Old behavior break bootstd malloc'd buffers of exact file size. Previously this OOB write have not been noticed because distroboot usually read files into huge static memory areas. Signed-off-by: Alex Shumsky Fixes: e342718 ("fs: btrfs: Implement btrfs_file_read()") Reviewed-by: Qu Wenruo Thanks, Qu --- Changes in v2: - fix error path handling - add Fixes tag - use min3 fs/btrfs/inode.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4691612eda..3998ffc2c8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -640,7 +640,11 @@ static int read_and_truncate_page(struct btrfs_path *path, extent_type = btrfs_file_extent_type(leaf, fi); if (extent_type == BTRFS_FILE_EXTENT_INLINE) { ret = btrfs_read_extent_inline(path, fi, buf); - memcpy(dest, buf + page_off, min(page_len, ret)); + if (ret < 0) { + free(buf); + return ret; + } + memcpy(dest, buf + page_off, min3(page_len, ret, len)); free(buf); return len; } @@ -652,7 +656,7 @@ static int read_and_truncate_page(struct btrfs_path *path, free(buf); return ret; } - memcpy(dest, buf + page_off, page_len); + memcpy(dest, buf + page_off, min(page_len, len)); free(buf); return len; }
Re: [PATCH] fs: btrfs: fix out of bounds write
在 2024/6/18 05:19, Alex Shumsky 写道: Fix btrfs_read/read_and_truncate_page write out of bounds of destination buffer. Old behavior break bootstd malloc'd buffers of exact file size. Previously this OOB write have not been noticed because distroboot usually read files into huge static memory areas. Signed-off-by: Alex Shumsky Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4691612eda..b51f578b49 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -640,7 +640,7 @@ static int read_and_truncate_page(struct btrfs_path *path, extent_type = btrfs_file_extent_type(leaf, fi); if (extent_type == BTRFS_FILE_EXTENT_INLINE) { ret = btrfs_read_extent_inline(path, fi, buf); - memcpy(dest, buf + page_off, min(page_len, ret)); + memcpy(dest, buf + page_off, min(min(page_len, ret), len)); free(buf); return len; } @@ -652,7 +652,7 @@ static int read_and_truncate_page(struct btrfs_path *path, free(buf); return ret; } - memcpy(dest, buf + page_off, page_len); + memcpy(dest, buf + page_off, min(page_len, len)); free(buf); return len; }
Re: BTRFS use-after-free bug at free_extent_buffer_internal
在 2024/4/22 16:45, Qu Wenruo 写道: [...] I added a print statement to free_extent_buffer_internal that prints the start address of the extent_buffer as I'm not sure what to be looking for here. This print statement is before the decrement. printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", eb->start, eb->refs); Just a small advice, in fact you can go with sandbox mode, running U-boot in userspace, and bind a host file as a device to test the filesystem code. At least that's what I did for most U-boot bugs. Thanks, Qu
Re: BTRFS use-after-free bug at free_extent_buffer_internal
在 2024/4/22 16:07, Sachi King 写道: Hi, I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain it's a bug and not a corruption issue. A bit of history on the filesystem. It is a fairly new filesystem as it was being used to give me access to test a wayland application on a Raspberry Pi. The filesystem was about 3 days old when I hit the bug, and I'm fairly certain it never had an unclean shutdown. I have checked the filesystem with "btrfs check" which has found no errors. The filesystem mounts on Linux and is functional. # btrfs check --check-data-csum /dev/sda2 Opening filesystem to check... Checking filesystem on /dev/sda2 UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d [1/7] checking root items [2/7] checking extents [3/7] checking free space tree [4/7] checking fs roots [5/7] checking csums against data [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) found 5070573568 bytes used, no error found total csum bytes: 4451620 total tree bytes: 370458624 total fs tree bytes: 353124352 total extent tree bytes: 10010624 btree space waste bytes: 62303284 file data blocks allocated: 6786519040 referenced 6328619008 Since btrfs check reports no error, the fs must be valid. But considering how new it is, it may be related to some new features not properly implemented in Uboot. Is it possible to provide the whole binary dump of the fs? I've made an image of the filesystem so I could reproduce the bug in an environment that doesn't require the physical SBC, and have reproduced the issue using the head of the master branch with "qemu-x86_64_defconfig". My testing qemu was produced with the following: # make qemu-x86_64_defconfig # cat << EOF >> .config CONFIG_AUTOBOOT=y CONFIG_BOOTDELAY=1 CONFIG_USE_BOOTCOMMAND=y CONFIG_BOOTSTD_DEFAULTS=y CONFIG_BOOTSTD_FULL=y CONFIG_CMD_BOOTFLOW_FULL=y CONFIG_BOOTCOMMAND="bootflow scan -lb" CONFIG_ENV_IS_NOWHERE=y CONFIG_LZ4=y CONFIG_BZIP2=y CONFIG_ZSTD=y CONFIG_FS_BTRFS=y CONFIG_CMD_BTRFS=y CONFIG_GZIP=y CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi" EOF # make -j24 bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains: / { bootstd { compatible = "u-boot,boot-std"; filename-prefixes = "/@boot/", "/boot/", "/"; bootdev-order = "scsi"; extlinux { compatible = "u-boot,extlinux"; }; }; }; The VM was run with qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action reboot=shutdown -drive file=/mnt/dbg/u-boot-debug.img The error message I recive on boot is BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()! BUG! resetting ... I added a print statement to free_extent_buffer_internal that prints the start address of the extent_buffer as I'm not sure what to be looking for here. This print statement is before the decrement. printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", eb->start, eb->refs); The last message before the crash reported eb->start to be "0", with 0 refs. free_extent_buffer_internal: eb->start[0] eb->refs[0] The extent at 0 struck me as odd, so I tried commenting out the freeing, by removing the call to free_extent_buffer_final, and this resulted in bootflow succeeding and showing me the boot menu, which suprised me. I expected to see the bug reproduce itself, with refs being zero, but eb->start pointing somewhere valid, but I instead got a valid address with refs at 2. I'm assuming that the order free_extent_buffer_internal is called is deterministic, so by counting the print outputs the line that prior held the extent_buffer with a 0 start was replaced with: free_extent_buffer_internal: eb->start[249c000] eb->refs[2] Interestingly, as can be seen in the full logs with my included print messages, 249c000 is being used just before this, with a ref count of 2. 249c000 does appear to reach a point where it should have been freed in the past, before it gets used again as seen in both logs. The failing boot log: U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000) Trying to boot from SPI Jumping to 64-bit U-Boot: Note many features are missing U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000) CPU: QEMU Virtual CPU version 2.5+ DRAM: 128 MiB Core: 13 devices, 13 uclasses, devicetree: separate Loading Environment from nowhere... OK Model: QEMU x86 (I440FX) Net: e1000: 00:00:00:00:00:00 Error: e1000#0 No valid MAC address found. eth_initialize() No ethernet found. Hit any key to stop autoboot: 0 Scanning for bootflows in all bootdevs Seq Method State UclassPart Name Filename --- --- -- scanning bus for devices... Target spinup took 0 ms. SATA link 1 timeout. Target spinup took 0 ms. SATA link 3 timeout. SATA link 4 timeout. SATA link 5 timeout.
Re: [PATCH] fs: btrfs: fix reading when length specified
On 2023/11/12 01:49, Sam Edwards wrote: The btrfs read function limits the read length to ensure that it and the read offset do not together exceed the size of the file. However, this size was only being queried if the read length was passed a value of zero (meaning "whole file"), and the size is defaulted to 0 otherwise. This means the clamp will just zero out the length if one is specified, preventing reading of the file. Fix this by checking the file size unconditionally, and unifying the default length and clamping logic as a single range check instead. This bug was discovered when trying to boot Linux with initrd= via 'bootefi' from a btrfs partition. The EFI stub entered an infinite loop of zero-length reads while trying to read the initrd, and the boot process stalled indefinitely. I'm not sure why this happend for the EFI environment. Doesn't the EFI runtime should also try to read the whole file? Or that EFI environment has specified the length to read instead? Signed-off-by: Sam Edwards Anyway, the fix looks good to me. Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/btrfs.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 4cdbbbe3d0..1149a3b200 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -228,7 +228,7 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, { struct btrfs_fs_info *fs_info = current_fs_info; struct btrfs_root *root; - loff_t real_size = 0; + loff_t real_size; u64 ino; u8 type; int ret; @@ -246,16 +246,13 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return -EINVAL; } - if (!len) { - ret = btrfs_size(file, _size); - if (ret < 0) { - error("Failed to get inode size: %s", file); - return ret; - } - len = real_size; + ret = btrfs_size(file, _size); + if (ret < 0) { + error("Failed to get inode size: %s", file); + return ret; } - if (len > real_size - offset) + if (!len || len > real_size - offset) len = real_size - offset; ret = btrfs_file_read(root, ino, offset, len, buf);
Re: [PATCH] btrfs: fix some error checking for btrfs_decompress()
On 2023/8/3 18:29, Dan Carpenter wrote: The btrfs_decompress() function mostly (u32)-1 on error but it can also return -EPERM or other kernel error codes from zstd_decompress(). The "ret" variable is an int, so we could just check for negatives. Signed-off-by: Dan Carpenter Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 38e285bf94b0..4691612eda33 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -390,7 +390,7 @@ int btrfs_read_extent_inline(struct btrfs_path *path, csize); ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi), cbuf, csize, dbuf, dsize); - if (ret == (u32)-1) { + if (ret < 0) { ret = -EIO; goto out; } @@ -500,7 +500,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi), cbuf, csize, dbuf, dsize); - if (ret == (u32)-1) { + if (ret < 0) { ret = -EIO; goto out; }
Re: [PATCH] fs: btrfs: Prevent error pointer dereference in list_subvolums()
On 2023/7/26 14:59, Dan Carpenter wrote: If btrfs_read_fs_root() fails with -ENOENT, then we go to the next entry. Fine. But if it fails for a different reason then we need to clean up and return an error code. In the current code it doesn't clean up but instead dereferences "root" and crashes. Signed-off-by: Dan Carpenter Reviewed-by: Qu Wenruo --- I didn't CC the btrfs mailing list. Perhaps, I should have? This patch is fine. The function is specific to U-boot, and not utilized by kernel/btrfs-progs. Thanks, Qu fs/btrfs/subvolume.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/subvolume.c b/fs/btrfs/subvolume.c index d446e7a2c418..68ca7e48e48e 100644 --- a/fs/btrfs/subvolume.c +++ b/fs/btrfs/subvolume.c @@ -199,6 +199,7 @@ static int list_subvolums(struct btrfs_fs_info *fs_info) ret = PTR_ERR(root); if (ret == -ENOENT) goto next; + goto out; } ret = list_one_subvol(root, result); if (ret < 0)
Re: [PATCH] global: Use proper project name U-Boot
On 2023/5/17 15:17, Michal Simek wrote: Use proper project name in comments, Kconfig, readmes. Signed-off-by: Michal Simek [...] diff --git a/fs/btrfs/compat.h b/fs/btrfs/compat.h index 9cf8a10c76c5..02173dea5f48 100644 --- a/fs/btrfs/compat.h +++ b/fs/btrfs/compat.h @@ -46,7 +46,7 @@ /* * Read data from device specified by @desc and @part * - * U-boot equivalent of pread(). + * U-Boot equivalent of pread(). * * Return the bytes of data read. * Return <0 for error. diff --git a/fs/btrfs/extent-io.h b/fs/btrfs/extent-io.h index 6b0c87da969d..5c5c579d1eaf 100644 --- a/fs/btrfs/extent-io.h +++ b/fs/btrfs/extent-io.h @@ -8,7 +8,7 @@ * Use pointer to provide better alignment. * - Remove max_cache_size related interfaces * Includes free_extent_buffer_nocache() - * As we don't cache eb in U-boot. + * As we don't cache eb in U-Boot. * - Include headers * * Write related functions are kept as we still need to modify dummy extent For the btrfs part: Reviewed-by: Qu Wenruo Thanks, Qu
Re: [PATCH U-BOOT v2] btrfs: fix offset when reading compressed extents
On 2023/4/18 14:41, Dominique Martinet wrote: From: Dominique Martinet btrfs_read_extent_reg correctly computed the extent offset in the BTRFS_COMPRESS_NONE case, but did not account for the 'offset - key.offset' part correctly in the compressed case, making the function read incorrect data. In the case I examined, the last 4k of a file was corrupted and contained data from a few blocks prior, e.g. reading a 10k file with a single extent: btrfs_file_read() -> btrfs_read_extent_reg (aligned part loop, until 8k) -> read_and_truncate_page -> btrfs_read_extent_reg (re-reads the last extent from 8k to the end, incorrectly reading the first 2k of data) This can be reproduced as follow: $ truncate -s 200M btr $ mount btr -o compress /mnt $ pat() { dd if=/dev/zero bs=1M count=$1 iflag=count_bytes status=none | tr '\0' "\\$2"; } $ { pat 4K 1; pat 4K 2; pat 2K 3; } > /mnt/file $ sync $ filefrag -v /mnt/file File size of /mnt/file is 10240 (3 blocks of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 2: 3328.. 3330: 3: last,encoded,eof $ umount /mnt Then in u-boot: => load scsi 0 200 file 10240 bytes read in 3 ms (3.3 MiB/s) => md 2001ff0 02001ff0: 02020202 02020202 02020202 02020202 02002000: 01010101 01010101 01010101 01010101 02002010: 01010101 01010101 01010101 01010101 (02002000 onwards should contain '03' pattern but went back to 01, start of the extent) After patch, data is read properly: => md 2001ff0 02001ff0: 02020202 02020202 02020202 02020202 02002000: 03030303 03030303 03030303 03030303 02002010: 03030303 03030303 03030303 03030303 Note that the code previously (before commit e3427184f38a ("fs: btrfs: Implement btrfs_file_read()")) did not split that read in two, so this is a regression even if the previous code might not have been handling offsets correctly either (something that booted now fails to boot) Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and btrfs_read_extent_reg()") Signed-off-by: Dominique Martinet Reviewed-by: Qu Wenruo Thanks, Qu --- Changes in v2: - Keep offset decomposition explicit where it is used - Add reproducer/clarify explanation in commit message - Drop other patches temporarily - Link to v1: https://lore.kernel.org/r/20230418-btrfs-extent-reads-v1-0-47ba9839f...@codewreck.org --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 40025662f250..38e285bf94b0 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -511,7 +511,9 @@ int btrfs_read_extent_reg(struct btrfs_path *path, if (ret < dsize) memset(dbuf + ret, 0, dsize - ret); /* Then copy the needed part */ - memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len); + memcpy(dest, + dbuf + btrfs_file_extent_offset(leaf, fi) + offset - key.offset, + len); ret = len; out: free(cbuf); --- base-commit: 5db4972a5bbdbf9e3af48ffc9bc4fec73b7b6a79 change-id: 20230418-btrfs-extent-reads-e2df6e329ad4 Best regards,
Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end
On 2023/4/18 11:53, Dominique Martinet wrote: Qu Wenruo wrote on Tue, Apr 18, 2023 at 11:21:00AM +0800: No, was just thinking the leading part being a separate loop doesn't seem to make sense either as the code shouldn't care about sector size alignemnt but about full extents. The main concern related to the leading unaligned part is, we need to skip something unaligned from the beginning , while all other situations never need to skip such case (they at most skip the tailing part). Ok, there is one exception for inline extents apparently.. But I'm not still not convinced the `aligned_start != file_offset` check is enough for that either; I'd say it's unlikely but the inline part can be compressed, so we could have a file which has > 4k (sectorsize) of expanded data, so a read from the 4k offset would skip the special handling and fail (reading the whole extent in dest) Btrfs inline has a limit to sectorsize. That means, inlined compressed extent can at most be 4K sized (if 4K is our sector size). So that won't be a problem. Even if that's not possible, reading just the first 10 bytes of an inline extent will be aligned and go through the main loop which just reads the whole extent, so it'll need the same handling as the regular btrfs_read_extent_reg handling at which point it might just as well handle start offset too. If we just read 10 bytes, the aligned part should handle it well. My real concern is what if we read 10 bytes at offset 10 bytes. If this can be handled in the same way of aligned read (and still be reasonable readable), then it would be awesome. That aside taking the loop in order: - lookup_data_extent doesn't care (used in heading/tail) - skipping holes don't care as they explicitely put cursor at start of next extent (or bail out if nothing next) - inline needs fixing anyway as said above - PREALLOC or nothing on disk also goes straight to next and is ok - ah, I see what you meant now, we need to substract the current position within the extent to extent_num_bytes... That's also already a problem, though; to take the same example: 0 8k 16k [extent1 | extent2 ... ] reading from 4k onwards will try to read min(extent_num_bytes, end-cur) = min(8k, 12k) = 8k from the 4k offset which goes over the end of the extent. That's indeed a problem. As most of the Uboot fs drivers only care read the whole file, never really utilize the ability to read part of the file, that path is not properly tested. (Some driver, IIRC ubifs?, doesn't even allow read with non-zero offset) Thanks, Qu That could actually be my resets from the previous mail. So either the first check should just lookup the extent and check that extent start matches current offset instead of checking for sectorsize alignment, or we can just fix the loop and remove the first if.
Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end
On 2023/4/18 11:07, Dominique Martinet wrote: Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:53:41AM +0800: I have a feeling the loop could just be updated to go to the end `while (cur < end)` as it doesn't seem to care about the end alignment... Should I update v2 to do that instead? Yeah, it would be very awesome if you can remove the tailing part completely. Ok, will give it a try. I'll want to test this a bit so might take a day or two as I have other work to finish first. This made me look at the "Read out the leading unaligned part" initial part, and its check only looks at the sector size but it probably has the same problem -- we want to make sure we read any leftover from a previous extent e.g. this file: If you're talking about the same problem mentioned in patch 1, then yes, any read in the middle of an extent would cause problems. No, was just thinking the leading part being a separate loop doesn't seem to make sense either as the code shouldn't care about sector size alignemnt but about full extents. The main concern related to the leading unaligned part is, we need to skip something unaligned from the beginning , while all other situations never need to skip such case (they at most skip the tailing part). Maybe we can do some extra calculation to handle it, but I'm not in a hurry to address the leading part. Another thing is, for all other fs-ish interface (kernel/fuse etc), they all read/write in certain block size, then handle unaligned part from a higher layer. Thus I prefer to have most things handled in an aligned fashion, and only handle the unaligned part specially. But if you can find an elegant way to handle all cases, that would be really awesome! Thanks, Qu If the main loop handles everything correctly then the leading if can also be removed along with "read_and_truncate_page" that would no longer be used. I'll give this a try as well and report back. No matter if it's aligned or not, as btrfs would convert any unaligned part into aligned read. Yes I don't see where it would fail (except that my board does crash), I guess that at this point I should spend some time building a qemu u-boot and hooking up gdb will be faster.. My initial plan is to merge the tailing part into the aligned loop, but since you're already working in this part, feel free to do it. Yes, sure -- removing the if is easy, I'd just rather not make it fail for someone else :)
Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end
On 2023/4/18 10:41, Dominique Martinet wrote: Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:02:00AM +0800: /* Read the tailing unaligned part*/ Can we remove this part completely? IIRC if we read until the target end, the unaligned end part can be completely removed then. The "Read the aligned part" loop stops at aligned_end: while (cur < aligned_end) So it should be possible that the last aligned extent we consider does not contain data until the end, e.g. an offset that ends with the aligned end: 040964123 [extent1-|extent2] In this case the main loop will only read extent1 and we need the "trailing unaligned part" if for extent2. I have a feeling the loop could just be updated to go to the end `while (cur < end)` as it doesn't seem to care about the end alignment... Should I update v2 to do that instead? Yeah, it would be very awesome if you can remove the tailing part completely. This made me look at the "Read out the leading unaligned part" initial part, and its check only looks at the sector size but it probably has the same problem -- we want to make sure we read any leftover from a previous extent e.g. this file: If you're talking about the same problem mentioned in patch 1, then yes, any read in the middle of an extent would cause problems. No matter if it's aligned or not, as btrfs would convert any unaligned part into aligned read. But if we fix the bug you mentioned, I see nothing can going wrong. Unaligned read is converted into aligned one (then copy the target range into the dest buf), and aligned part (including the tailing unaligned part) would be properly handled then. 0 4096 8192 [extent1--|extent2] and reading from 4k with a sectorsize of 4k is probably bad will enter the aligned main loop right away... And I think that'll fail?... Actually not quite sure what's expecting what to be aligned in there, but I just tried some partial reads from non-zero offsets and my board resets almost all the time so I guess I've found something else to dig into. This isn't a priority for me right now but I'll look a bit more when I have more time if you haven't first. My initial plan is to merge the tailing part into the aligned loop, but since you're already working in this part, feel free to do it. And I'm very happy to provide any help on this. Thanks, Qu
Re: [PATCH U-BOOT 1/3] btrfs: fix offset within btrfs_read_extent_reg()
On 2023/4/18 10:05, Dominique Martinet wrote: Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800: The subject can be changed to "btrfs: fix offset when reading compressed extents". The original one is a little too generic. Ok. btrfs_file_read() -> btrfs_read_extent_reg (aligned part loop from 0x4048 to 0x40ba, 128KB at a time) last read had 0x4000 bytes in extent, but aligned_end - cur limited the read to 0x3000 (offset 0x72) -> read_and_truncate_page -> btrfs_read_extent_reg reading the last 0x1000 bytes from offset 0x73000 (end of the previous 0x4000) into 0x40ba3000 here key.offset = 0x7 so we need to use it to recover the 0x3000 offset. You can use "btrfs ins dump-tree" to provide a much easier to read on-disk data layout. key.offset is the offset within the read call, not the offset on disk. The file on disk looks perfectly normal, the condition to trigger the bug is to have a file which size is not sector-aligned and where the last extent is bigger than a sector. Forgot to mention, this bug does not only affect the mentioned case, it affects all partial read of an extent. Even if it's sector aligned. I had a look at dump-tree early on and couldn't actually find my file in there, now the problem is understood it should be easy to make a reproducer so I'll add this info and commands needed to reproduce (+ a link to a fs image just in case) in v2 /* Preallocated or hole , fill @dest with zero */ if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { u64 logical; - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + - btrfs_file_extent_offset(leaf, fi) + - offset - key.offset; + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; This is touching non-compressed path, which is very weird as your commit message said this part is correct. my rationale is that since this was considered once but forgotten the other time it'll be easy to add yet another code path that forgets it later, but I guess it won't change much anyway -- I'll fix the patch making it explicit again. Thanks,
Re: [PATCH U-BOOT 1/3] btrfs: fix offset within btrfs_read_extent_reg()
On 2023/4/18 10:05, Dominique Martinet wrote: Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800: The subject can be changed to "btrfs: fix offset when reading compressed extents". The original one is a little too generic. Ok. btrfs_file_read() -> btrfs_read_extent_reg (aligned part loop from 0x4048 to 0x40ba, 128KB at a time) last read had 0x4000 bytes in extent, but aligned_end - cur limited the read to 0x3000 (offset 0x72) -> read_and_truncate_page -> btrfs_read_extent_reg reading the last 0x1000 bytes from offset 0x73000 (end of the previous 0x4000) into 0x40ba3000 here key.offset = 0x7 so we need to use it to recover the 0x3000 offset. You can use "btrfs ins dump-tree" to provide a much easier to read on-disk data layout. key.offset is the offset within the read call, not the offset on disk. The file on disk looks perfectly normal, the condition to trigger the bug is to have a file which size is not sector-aligned and where the last extent is bigger than a sector. Yes, I understand the runtime offset is involved. But you're still trying to explain the situation with involved on-disk file extent, right? In that case it's way easier to go something like this: We can have a compressed file extent like this: item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 generation 7 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 32768 ram 32768 extent compression 1 (zlib) Then if we try to read file range [4K, 8K) of inode 257 in Uboot, then we got corrupted data. At least that's the preferred way to explain in btrfs community (with on-disk thus static part, then runtime part). I had a look at dump-tree early on and couldn't actually find my file in there, now the problem is understood it should be easy to make a reproducer so I'll add this info and commands needed to reproduce (+ a link to a fs image just in case) in v2 The reproducer is super easy. # mkfs.btrfs -f $dev # mount $dev -o compress $mnt # xfs_io -f -c "pwrite -i /dev/urandom 0 16K" $mnt/file # umount $mnt Then in uboot # load host 0 $addr file 0x1000 0x1000 # md $addr And compare to regular xxd from kernel. /* Preallocated or hole , fill @dest with zero */ if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { u64 logical; - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + - btrfs_file_extent_offset(leaf, fi) + - offset - key.offset; + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; This is touching non-compressed path, which is very weird as your commit message said this part is correct. my rationale is that since this was considered once but forgotten the other time it'll be easy to add yet another code path that forgets it later, but I guess it won't change much anyway -- I'll fix the patch making it explicit again. OK, that makes sense now. Thanks, Qu Thanks,
Re: [PATCH U-BOOT 3/3] btrfs: btfs_file_read: zero trailing data if no extent was found
On 2023/4/18 09:17, Dominique Martinet wrote: From: Dominique Martinet btfs_file_read's truncate path has a comment noting '>0 means no extent' and bailing out immediately, but the buffer has not been written so probably needs zeroing out. This is a theorical fix only and hasn't been tested on a file that actually runs this code path. IIRC there is a memset() at the very beginning of btrfs_file_read() to set the whole dest memory to zero. This is to handle cases like NO_HOLE cases, which we can skip hole extents. Thanks, Qu Signed-off-by: Dominique Martinet --- fs/btrfs/inode.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index efffec0f2e68..23c006c98c3b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -756,9 +756,12 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, btrfs_release_path(); ret = lookup_data_extent(root, , ino, cur, _offset); - /* <0 is error, >0 means no extent */ - if (ret) + /* <0 is error, >0 means no extent: zero end of buffer */ + if (ret) { + if (ret > 0) + memset(dest + cur, 0, end - cur); goto out; + } fi = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_file_extent_item); ret = read_and_truncate_page(, fi, cur,
Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end
On 2023/4/18 09:17, Dominique Martinet wrote: From: Dominique Martinet btrfs_file_read main 'aligned read' loop would limit the last read to the aligned end even if the data is present in the extent: there is no reason not to read the data that we know will be presented correctly. If that somehow fails cur only advances up to the aligned_end anyway and the file tail is read through the old code unchanged; this could be required if e.g. the end data is inlined. Signed-off-by: Dominique Martinet --- fs/btrfs/inode.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3d6e39e6544d..efffec0f2e68 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -663,7 +663,8 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, struct btrfs_path path; struct btrfs_key key; u64 aligned_start = round_down(file_offset, fs_info->sectorsize); - u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize); + u64 end = file_offset + len; + u64 aligned_end = round_down(end, fs_info->sectorsize); u64 next_offset; u64 cur = aligned_start; int ret = 0; @@ -743,26 +744,26 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, extent_num_bytes = btrfs_file_extent_num_bytes(path.nodes[0], fi); ret = btrfs_read_extent_reg(, fi, cur, - min(extent_num_bytes, aligned_end - cur), + min(extent_num_bytes, end - cur), dest + cur - file_offset); if (ret < 0) goto out; - cur += min(extent_num_bytes, aligned_end - cur); + cur += min(extent_num_bytes, end - cur); } /* Read the tailing unaligned part*/ Can we remove this part completely? IIRC if we read until the target end, the unaligned end part can be completely removed then. Thanks, Qu - if (file_offset + len != aligned_end) { + if (file_offset + len != cur) { btrfs_release_path(); - ret = lookup_data_extent(root, , ino, aligned_end, + ret = lookup_data_extent(root, , ino, cur, _offset); /* <0 is error, >0 means no extent */ if (ret) goto out; fi = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_file_extent_item); - ret = read_and_truncate_page(, fi, aligned_end, - file_offset + len - aligned_end, - dest + aligned_end - file_offset); + ret = read_and_truncate_page(, fi, cur, + file_offset + len - cur, + dest + cur - file_offset); } out: btrfs_release_path();
Re: [PATCH U-BOOT 1/3] btrfs: fix offset within btrfs_read_extent_reg()
On 2023/4/18 09:17, Dominique Martinet wrote: From: Dominique Martinet The subject can be changed to "btrfs: fix offset when reading compressed extents". The original one is a little too generic. btrfs_read_extent_reg correctly computed the extent offset in the BTRFS_COMPRESS_NONE case, but did not account for the 'offset - key.offset' part correctly in the compressed case, making the function read incorrect data. In the case I examined, the last 4k of a file was corrupted and contained data from a few blocks prior: btrfs_file_read() -> btrfs_read_extent_reg (aligned part loop from 0x4048 to 0x40ba, 128KB at a time) last read had 0x4000 bytes in extent, but aligned_end - cur limited the read to 0x3000 (offset 0x72) -> read_and_truncate_page -> btrfs_read_extent_reg reading the last 0x1000 bytes from offset 0x73000 (end of the previous 0x4000) into 0x40ba3000 here key.offset = 0x7 so we need to use it to recover the 0x3000 offset. You can use "btrfs ins dump-tree" to provide a much easier to read on-disk data layout. And you can also add a smaller reproducer. Confirmed by checking data, before patch: u-boot=> mmc load 1:1 $loadaddr boot/uImage u-boot=> md 0x40ba 40ba: c0232ff8 c0c722cb 030e94be bf1c./#..".. u-boot=> md 0x40ba3000 40ba3000: c0232ff8 c0c722cb 030e94be bf1c./#..".. After patch (also confirmed with data read from linux): u-boot=> md 0x40ba3000 40ba3000: 64cc9f03 81142256 691c 0c03483c...dV".i --- fs/btrfs/inode.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 40025662f250..3d6e39e6544d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -443,6 +443,8 @@ int btrfs_read_extent_reg(struct btrfs_path *path, IS_ALIGNED(len, fs_info->sectorsize)); ASSERT(offset >= key.offset && offset + len <= key.offset + extent_num_bytes); + /* offset is now offset within extent */ + offset = btrfs_file_extent_offset(leaf, fi) + offset - key.offset; I prefer not to use the @offset, explained later. /* Preallocated or hole , fill @dest with zero */ if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC || @@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) { u64 logical; - logical = btrfs_file_extent_disk_bytenr(leaf, fi) + - btrfs_file_extent_offset(leaf, fi) + - offset - key.offset; + logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset; This is touching non-compressed path, which is very weird as your commit message said this part is correct. I prefer the original one to show everything we need to take into consideration. Otherwise looks good to me. Thanks, Qu read = len; num_copies = btrfs_num_copies(fs_info, logical, len); @@ -511,7 +511,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path, if (ret < dsize) memset(dbuf + ret, 0, dsize - ret); /* Then copy the needed part */ - memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len); + memcpy(dest, dbuf + offset, len); ret = len; out: free(cbuf);
[PATCH] fs: btrfs: limit the mapped length to the original length
[BUG] There is a bug report that btrfs driver caused hang during file read: This breaks btrfs on the HiFive Unmatched. => pci enum PCIE-0: Link up (Gen1-x8, Bus0) => nvme scan => load nvme 0:2 0x8c00 /boot/dtb/sifive/hifive-unmatched-a00.dtb [hangs] [CAUSE] The reporter provided some debug output: read_extent_data: cur=615817216, orig_len=16384, cur_len=16384 read_extent_data: btrfs_map_block: cur_len=479944704; ret=0 read_extent_data: ret=0 read_extent_data: cur=615833600, orig_len=4096, cur_len=4096 read_extent_data: btrfs_map_block: cur_len=479928320; ret=0 Note the second and the last line, the @cur_len is 450+MiB, which is almost a chunk size. And inside __btrfs_map_block(), we limits the returned value to stripe length, but that's depending on the chunk type: if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4 | BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_DUP)) { /* we limit the length of each bio to what fits in a stripe */ *length = min_t(u64, ce->size - offset, map->stripe_len - stripe_offset); } else { *length = ce->size - offset; } This means, if the chunk is SINGLE profile, then we don't limit the returned length at all, and even for other profiles, we can still return a length much larger than the requested one. [FIX] Properly clamp the returned length, preventing it from returning a much larger range than expected. Reported-by: Andreas Schwab Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4aaaeab663f5..7d4095d9ca88 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -956,6 +956,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, struct btrfs_mapping_tree *map_tree = _info->mapping_tree; struct cache_extent *ce; struct map_lookup *map; + u64 orig_len = *length; u64 offset; u64 stripe_offset; u64 *raid_map = NULL; @@ -1047,6 +1048,7 @@ again: } else { *length = ce->size - offset; } + *length = min_t(u64, *length, orig_len); if (!multi_ret) goto out; -- 2.39.1
Re: [PATCH] fs/btrfs: handle data extents, which crosss stripe boundaries, correctly
On 2023/2/13 00:20, Andreas Schwab wrote: When I print ce->size in __btrfs_map_block, it is almost always 1073741824, which looks bogus. Can you provide the image of that filesystem? Thanks, Qu
Re: Possible bug in BTRFS w/ Duplication
On 2022/12/30 23:28, Sam Winchenbach wrote: I believe you fixed the issue with the patch you presented. I was in the process of testing a similar fix for release and it solved the issue I encountered. But I still want to make sure that only RAID0 on single device can cause the problem. For multi-device RAID0/RAID10/RAID5/6, we don't support them until we can scan all devices in U-boot... Thanks, Qu Thanks, Sam Winchenbach -Original Message- From: U-Boot On Behalf Of Qu Wenruo Sent: Thursday, December 29, 2022 7:01 PM To: Heinrich Schuchardt ; Sam Winchenbach ; Marek Behún Cc: u-boot@lists.denx.de; Qu Wenruo ; linux-bt...@vger.kernel.org Subject: Re: Possible bug in BTRFS w/ Duplication On 2022/12/29 22:12, Heinrich Schuchardt wrote: On 12/28/22 21:51, Sam Winchenbach wrote: Hello, Hello, I have hit the following situation when trying to load files from a BTRFS partition with duplication enabled. You mean multi-device? For DUP/RAID1 duplication, they don't have stripe limitation at all. Thus I believe you're talking about RAID0 (which doesn't have any duplication/extra mirrors) or RAID10 or RAID5/6? But for now, we don't support multi-device in U-boot yet, thus I'm not sure what situation you're talking about. Mind to run the following command? # btrfs fi usage In the first example I read a 16KiB file - __btrfs_map_block() changes the length to something larger than the file being read. This works fine, as length is later clamped to the file size. In the second example, __btrfs_map_block() changes the length parameter to something smaller than the file (the size of a stripe). This seems to break this check here: read = len; num_copies = btrfs_num_copies(fs_info, logical, len); for (i = 1; i <= num_copies; i++) { ret = read_extent_data(fs_info, dest, logical, , i); if (ret < 0 || read != len) { continue; } finished = true; break; } The problem being that read is always less than len. I am not sure if __btrfs_map_block is changing "len" to the incorrect value, or if there is some logic in "read_extent_data" that isn't correct. Any pointers on how this code is supposed to work would be greatly appreciated. Thanks. Thanks for reporting the issue $ scripts/get_maintainer.pl -f fs/btrfs/volumes.c suggests to include "Marek Behún" (maintainer:BTRFS) Qu Wenruo (reviewer:BTRFS) linux-bt...@vger.kernel.org to the communication. Best regards Heinrich === EXAMPLE 2 === Zynq> load mmc 1:0 0 16K [btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part === [btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data (ret = 0, read = 16384, len = 16384) [read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block (len = 16384) [read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len = 28672) [read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len = 16384) [read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 16384) [btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data (ret = 0, read = 16384, len = 16384) cur: 0, extent_num_bytes: 16384, aligned_end: 16384 16384 bytes read in 100 ms (159.2 KiB/s) === EXAMPLE 2 === Zynq> load mmc 1:0 0 32K [btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part === [btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data (ret = 0, read = 32768, len = 32768) [read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block (len = 32768) [read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 12288) So the first 3 sectors are before the stripe boundary and we read it correctly. [btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data (ret = 0, read = 12288, len = 32768) [btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data (ret = 0, read = 12288, len = 32768) [read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len = 12288) I believe this is the problem. If we're reading the full 32K, and the first 12K is in the first stripe, we should then try to map the remaining 20K, not the 12K again. I'll look into the situation. But if you can provide the image or the dump, it can greatly help the debugging. Thanks, Qu [read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 12288) [btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data (ret = 0, read = 12288, len = 32768) file: fs/btrfs/inode.c, line: 468 cur: 0, extent_num_bytes: 32768, aligned_end: 32768 -> btrfs_read_extent_reg: -5, line: 75
[PATCH] fs/btrfs: handle data extents, which crosss stripe boundaries, correctly
[BUG] Since btrfs supports single device RAID0 at mkfs time after btrfs-progs v5.14, if we create a single device raid0 btrfs, and created a file crossing stripe boundary: # mkfs.btrfs -m dup -d raid0 test.img # mount test.img mnt # xfs_io -f -c "pwrite 0 128K" mnt/file # umount mnt Since btrfs is using 64K as stripe length, above 128K data write is definitely going to cross at least one stripe boundary. Then u-boot would fail to read above 128K file: => host bind 0 /home/adam/test.img => ls host 0 < > 131072 Fri Dec 30 00:18:25 2022 file => load host 0 0 file BTRFS: An error occurred while reading file file Failed to load 'file' [CAUSE] Unlike tree blocks read, data extent reads doesn't consider cases in which one data extent can cross stripe boundary. In read_data_extent(), we just call btrfs_map_block() once and read the first mapped range. And if the first mapped range is smaller than the desired range, it would return error. But since even single device btrfs can utilize RAID0 profiles, the first mapped range can only be at most 64K for RAID0 profiles, and cause false error. [FIX] Just like read_whole_eb(), we should call btrfs_map_block() in a loop until we read all data. Since we're here, also add extra error messages for the following cases: - btrfs_map_block() failure We already have the error message for it. - Missing device This should not happen, as we only support single device for now. - __btrfs_devread() failure With this bug fixed, btrfs driver of u-boot can properly read the above 128K file, and have the correct content: => host bind 0 /home/adam/test.img => ls host 0 < > 131072 Fri Dec 30 00:18:25 2022 file => load host 0 0 file 131072 bytes read in 0 ms => md5sum 0 0x2 md5 for ... 0001 ==> d48858312a922db7eb86377f638dbc9f ^^^ Above md5sum also matches. Reported-by: Sam Winchenbach Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 49 +- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3f0d9f1c113b..7eaa7e949604 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -541,34 +541,39 @@ struct extent_buffer* read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical, u64 *len, int mirror) { - u64 offset = 0; + u64 orig_len = *len; + u64 cur = logical; struct btrfs_multi_bio *multi = NULL; struct btrfs_device *device; int ret = 0; - u64 max_len = *len; - ret = btrfs_map_block(fs_info, READ, logical, len, , mirror, - NULL); - if (ret) { - fprintf(stderr, "Couldn't map the block %llu\n", - logical + offset); - goto err; - } - device = multi->stripes[0].dev; + while (cur < logical + orig_len) { + u64 cur_len = logical + orig_len - cur; - if (*len > max_len) - *len = max_len; - if (!device->desc || !device->part) { - ret = -EIO; - goto err; - } - - ret = __btrfs_devread(device->desc, device->part, data, *len, - multi->stripes[0].physical); - if (ret != *len) - ret = -EIO; - else + ret = btrfs_map_block(fs_info, READ, cur, _len, , + mirror, NULL); + if (ret) { + error("Couldn't map the block %llu", cur); + goto err; + } + device = multi->stripes[0].dev; + if (!device->desc || !device->part) { + error("devid %llu is missing", device->devid); + ret = -EIO; + goto err; + } + ret = __btrfs_devread(device->desc, device->part, + data + (cur - logical), cur_len, + multi->stripes[0].physical); + if (ret != cur_len) { + error("read failed on devid %llu physical %llu", + device->devid, multi->stripes[0].physical); + ret = -EIO; + goto err; + } + cur += cur_len; ret = 0; + } err: kfree(multi); return ret; -- 2.39.0
Re: Possible bug in BTRFS w/ Duplication
On 2022/12/29 22:12, Heinrich Schuchardt wrote: On 12/28/22 21:51, Sam Winchenbach wrote: Hello, Hello, I have hit the following situation when trying to load files from a BTRFS partition with duplication enabled. You mean multi-device? For DUP/RAID1 duplication, they don't have stripe limitation at all. Thus I believe you're talking about RAID0 (which doesn't have any duplication/extra mirrors) or RAID10 or RAID5/6? But for now, we don't support multi-device in U-boot yet, thus I'm not sure what situation you're talking about. Mind to run the following command? # btrfs fi usage In the first example I read a 16KiB file - __btrfs_map_block() changes the length to something larger than the file being read. This works fine, as length is later clamped to the file size. In the second example, __btrfs_map_block() changes the length parameter to something smaller than the file (the size of a stripe). This seems to break this check here: read = len; num_copies = btrfs_num_copies(fs_info, logical, len); for (i = 1; i <= num_copies; i++) { ret = read_extent_data(fs_info, dest, logical, , i); if (ret < 0 || read != len) { continue; } finished = true; break; } The problem being that read is always less than len. I am not sure if __btrfs_map_block is changing "len" to the incorrect value, or if there is some logic in "read_extent_data" that isn't correct. Any pointers on how this code is supposed to work would be greatly appreciated. Thanks. Thanks for reporting the issue $ scripts/get_maintainer.pl -f fs/btrfs/volumes.c suggests to include "Marek Behún" (maintainer:BTRFS) Qu Wenruo (reviewer:BTRFS) linux-bt...@vger.kernel.org to the communication. Best regards Heinrich === EXAMPLE 2 === Zynq> load mmc 1:0 0 16K [btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part === [btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data (ret = 0, read = 16384, len = 16384) [read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block (len = 16384) [read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len = 28672) [read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len = 16384) [read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 16384) [btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data (ret = 0, read = 16384, len = 16384) cur: 0, extent_num_bytes: 16384, aligned_end: 16384 16384 bytes read in 100 ms (159.2 KiB/s) === EXAMPLE 2 === Zynq> load mmc 1:0 0 32K [btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part === [btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data (ret = 0, read = 32768, len = 32768) [read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block (len = 32768) [read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 12288) So the first 3 sectors are before the stripe boundary and we read it correctly. [btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data (ret = 0, read = 12288, len = 32768) [btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data (ret = 0, read = 12288, len = 32768) [read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len = 12288) I believe this is the problem. If we're reading the full 32K, and the first 12K is in the first stripe, we should then try to map the remaining 20K, not the 12K again. I'll look into the situation. But if you can provide the image or the dump, it can greatly help the debugging. Thanks, Qu [read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len = 12288) [read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 12288) [btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data (ret = 0, read = 12288, len = 32768) file: fs/btrfs/inode.c, line: 468 cur: 0, extent_num_bytes: 32768, aligned_end: 32768 -> btrfs_read_extent_reg: -5, line: 758 BTRFS: An error occurred while reading file 32K Failed to load '32K' Sam Winchenbach Embedded Software Engineer III Tethers Unlimited, Inc. | Connect Your Universe | www.tethers.com swinchenb...@tethers.com | C: 207-974-6934 11711 North Creek Pkwy # D113, Bothell, WA 98011-8808, USA
Re: [PATCH RESEND] fs: btrfs: remove the usage of undeclared fs_mutex variable
On 2022/9/28 23:23, Pankaj Raghav wrote: This line probably got in by mistake as there is no fs_mutex member in the btrfs_fs_info struct. Signed-off-by: Pankaj Raghav Reviewed-by: Qu Wenruo Thanks, Qu --- For UBOOT. The patch I sent the first time is not showing up in the uboot mailing list because of the missing approval. So, resending the patch. fs/btrfs/disk-io.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8043abc1bd..c80f8e8028 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void) fs_info->fs_root_tree = RB_ROOT; cache_tree_init(_info->mapping_tree.cache_tree); - mutex_init(_info->fs_mutex); - return fs_info; free_all: btrfs_free_fs_info(fs_info);
Re: [PATCH] fs: btrfs: remove the usage of undeclared fs_mutex variable
On 2022/9/27 18:49, Pankaj Raghav wrote: Which branch is the code based on? I actually cloned it from u-boot github master. https://github.com/u-boot/u-boot/blob/f117c54cc83e3c519883edb5a48062644d38c443/fs/btrfs/disk-io.c#L785 I don't believe it's upstream, as such compiling error should be exposed very easily. Actually, I was also surprised but the compiler did not give me any error because the mutex definition was as follows: #define mutex_init(...) and the compiler did not generate any error because we don't do anything the parameter. --- fs/btrfs/disk-io.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c Furthermore at current upstream HEAD a1375562c0a8 ("Merge tag 'x86_urgent_for_v6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"), there is no btrfs_new_fs_info() function defined anyway. Hmm, I sent these patches for u-boot. Maybe there is a misunderstanding here? :) Yeah, you're completely right, I thought it's for kernel by default... Sorry for the confusion. Qu THanks, Qu index 8043abc1bd..c80f8e8028 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void) fs_info->fs_root_tree = RB_ROOT; cache_tree_init(_info->mapping_tree.cache_tree); - mutex_init(_info->fs_mutex); - return fs_info; free_all: btrfs_free_fs_info(fs_info);
Re: [PATCH] fs: btrfs: remove the usage of undeclared fs_mutex variable
On 2022/9/27 18:39, Qu Wenruo wrote: On 2022/9/27 17:55, Pankaj Raghav wrote: This line probably got in by mistake as there is no fs_mutex member in the btrfs_fs_info struct. Signed-off-by: Pankaj Raghav Which branch is the code based on? I don't believe it's upstream, as such compiling error should be exposed very easily. --- fs/btrfs/disk-io.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c Furthermore at current upstream HEAD a1375562c0a8 ("Merge tag 'x86_urgent_for_v6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"), there is no btrfs_new_fs_info() function defined anyway. My bad, I didn't notice it's for Uboot, not kernel. In that case, you can safely remove it, but it won't cause compile error since in , we define mutex_init() as noop, thus it doesn't cause any compile error. Anyway the patch itself looks good, I also checked if there is other mutex related usage, and this is the only one. Reviewed-by: Qu Wenruo Thanks, Qu THanks, Qu index 8043abc1bd..c80f8e8028 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void) fs_info->fs_root_tree = RB_ROOT; cache_tree_init(_info->mapping_tree.cache_tree); - mutex_init(_info->fs_mutex); - return fs_info; free_all: btrfs_free_fs_info(fs_info);
Re: [PATCH] fs: btrfs: remove the usage of undeclared fs_mutex variable
On 2022/9/27 17:55, Pankaj Raghav wrote: This line probably got in by mistake as there is no fs_mutex member in the btrfs_fs_info struct. Signed-off-by: Pankaj Raghav Which branch is the code based on? I don't believe it's upstream, as such compiling error should be exposed very easily. --- fs/btrfs/disk-io.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c Furthermore at current upstream HEAD a1375562c0a8 ("Merge tag 'x86_urgent_for_v6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"), there is no btrfs_new_fs_info() function defined anyway. THanks, Qu index 8043abc1bd..c80f8e8028 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void) fs_info->fs_root_tree = RB_ROOT; cache_tree_init(_info->mapping_tree.cache_tree); - mutex_init(_info->fs_mutex); - return fs_info; free_all: btrfs_free_fs_info(fs_info);
[PATCH v3 8/8] fs: erofs: add unaligned read range handling
I'm not an expert on erofs, but my quick glance didn't expose any special handling on unaligned range, thus I think the U-boot erofs driver doesn't really support unaligned read range. This patch will add erofs_get_blocksize() so erofs can benefit from the generic unaligned read support. Cc: Huang Jianan Cc: linux-er...@lists.ozlabs.org Signed-off-by: Qu Wenruo --- fs/erofs/internal.h | 1 + fs/erofs/super.c| 6 ++ fs/fs.c | 2 +- include/erofs.h | 1 + 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 4af7c91560cc..d368a6481bf1 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -83,6 +83,7 @@ struct erofs_sb_info { u16 available_compr_algs; u16 lz4_max_distance; u32 checksum; + u32 blocksize; u16 extra_devices; union { u16 devt_slotoff; /* used for mkfs */ diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 8277d9b53fb3..82625da59001 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -99,7 +99,13 @@ int erofs_read_superblock(void) sbi.build_time = le64_to_cpu(dsb->build_time); sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec); + sbi.blocksize = 1 << blkszbits; memcpy(, dsb->uuid, sizeof(dsb->uuid)); return erofs_init_devices(, dsb); } + +int erofs_get_blocksize(const char *filename) +{ + return sbi.blocksize; +} diff --git a/fs/fs.c b/fs/fs.c index 43c7128bcfc5..2ac43c05fcd8 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -376,7 +376,7 @@ static struct fstype_info fstypes[] = { .readdir = erofs_readdir, .ls = fs_ls_generic, .read = erofs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = erofs_get_blocksize, .size = erofs_size, .close = erofs_close, .closedir = erofs_closedir, diff --git a/include/erofs.h b/include/erofs.h index 1fbe82bf72cb..18bd6807c538 100644 --- a/include/erofs.h +++ b/include/erofs.h @@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int erofs_read(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int erofs_get_blocksize(const char *filename); int erofs_size(const char *filename, loff_t *size); int erofs_exists(const char *filename); void erofs_close(void); -- 2.37.1
[PATCH v3 7/8] fs: ubifs: rely on higher layer to do unaligned read
Currently ubifs doesn't support unaligned read offset, thanks to the recent _fs_read() work to handle unaligned read, we only need to implement ubifs_get_blocksize() to take advantage of it. Now ubifs can do unaligned read without any problem. Signed-off-by: Qu Wenruo --- Unfortunately I can not test ubifs, as enabling UBI would cause compile failure due to missing of header. --- fs/fs.c | 2 +- fs/ubifs/ubifs.c | 13 - include/ubifs_uboot.h | 1 + 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/fs.c b/fs/fs.c index ea4325cd0b00..43c7128bcfc5 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -312,7 +312,7 @@ static struct fstype_info fstypes[] = { .exists = ubifs_exists, .size = ubifs_size, .read = ubifs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ubifs_get_blocksize, .write = fs_write_unsupported, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c index d3026e310168..a8ab556dd376 100644 --- a/fs/ubifs/ubifs.c +++ b/fs/ubifs/ubifs.c @@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t offset, *actread = 0; - if (offset & (PAGE_SIZE - 1)) { - printf("ubifs: Error offset must be a multiple of %d\n", - PAGE_SIZE); - return -1; - } + /* Higher layer should ensure it always pass page aligned range. */ + assert(IS_ALIGNED(offset, PAGE_SIZE)); + assert(IS_ALIGNED(size, PAGE_SIZE)); c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY); /* ubifs_findfile will resolve symlinks, so we know that we get @@ -920,6 +918,11 @@ out: return err; } +int ubifs_get_blocksize(const char *filename) +{ + return PAGE_SIZE; +} + void ubifs_close(void) { } diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h index b025779d59ff..bcd21715314a 100644 --- a/include/ubifs_uboot.h +++ b/include/ubifs_uboot.h @@ -29,6 +29,7 @@ int ubifs_exists(const char *filename); int ubifs_size(const char *filename, loff_t *size); int ubifs_read(const char *filename, void *buf, loff_t offset, loff_t size, loff_t *actread); +int ubifs_get_blocksize(const char *filename); void ubifs_close(void); #endif /* __UBIFS_UBOOT_H__ */ -- 2.37.1
[PATCH v3 6/8] fs: fat: rely on higher layer to get block aligned read range
Just implement fat_get_blocksize() for fat, so that fat_read_file() always get a block aligned read range. Unfortunately I'm not experienced enough to cleanup the fat code, thus further cleanup is appreciated. Cc: Tom Rini Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 13 + fs/fs.c | 2 +- include/fat.h | 1 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index dcceccbcee0a..e13035e8e6d1 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ret; } +int fat_get_blocksize(const char *filename) +{ + fsdata fsdata = {0}; + int ret; + + ret = get_fs_info(); + if (ret) + return ret; + + free(fsdata.fatbuf); + return fsdata.sect_size; +} + typedef struct { struct fs_dir_stream parent; struct fs_dirent dirent; diff --git a/fs/fs.c b/fs/fs.c index b26cc8e2e840..ea4325cd0b00 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -208,7 +208,7 @@ static struct fstype_info fstypes[] = { .exists = fat_exists, .size = fat_size, .read = fat_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = fat_get_blocksize, #if CONFIG_IS_ENABLED(FAT_WRITE) .write = file_fat_write, .unlink = fat_unlink, diff --git a/include/fat.h b/include/fat.h index a9756fb4cd1b..c03a2bebecef 100644 --- a/include/fat.h +++ b/include/fat.h @@ -201,6 +201,7 @@ int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); int file_fat_read(const char *filename, void *buffer, int maxsize); +int fat_get_blocksize(const char *filename); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.37.1
[PATCH v3 4/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code to handle the unaligned read (aka, read range is not aligned to fs block size). This makes underlying fs code harder to implement, as they have to handle unaligned read all by themselves. This patch will change the behavior, starting from btrfs, by moving the unaligned read code into _fs_read(). The idea is pretty simple, if we have an unaligned read request, we handle it in the following steps: 1. Grab the blocksize of the fs 2. Read the leading unaligned range We will read the block that @offset is in, and copy the requested part into buf. The the block we read covers the whole range, we just call it a day. 3. Read the remaining part The tailing part may be unaligned, but all fses handles the tailing part much easier than the leading unaligned part. As they just need to do a min(extent_size, start + len - cur) to calculate the real read size. In fact, for most file reading, the file size is not aligned and we need to handle the tailing part anyway. There is a btrfs specific cleanup involved: - In btrfs_file_read(), merge the tailing unaligned read into the main loop. Just reuse the existing read length calculation is enough. - Remove read_and_truncate_page() call Since there is no explicit leading/tailing unaligned read anymore. This has been tested with a proper randomly populated btrfs file, then tried in sandbox mode with different aligned and unaligned range and compare the output with md5sum. Cc: Marek Behun Cc: linux-bt...@vger.kernel.org Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 10 fs/btrfs/inode.c | 89 +++- fs/fs.c | 130 --- include/btrfs.h | 1 + 4 files changed, 141 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 74a992fa012d..ac0e972d0249 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -234,6 +234,10 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, int ret; ASSERT(fs_info); + + /* Higher layer has ensures it never pass unaligned offset in. */ + ASSERT(IS_ALIGNED(offset, fs_info->sectorsize)); + ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID, file, , , , 40); if (ret < 0) { @@ -275,6 +279,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return 0; } +int btrfs_get_blocksize(const char *filename) +{ + ASSERT(current_fs_info); + return current_fs_info->sectorsize; +} + void btrfs_close(void) { if (current_fs_info) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 40025662f250..f12be46f6262 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -617,44 +617,6 @@ check_next: return 1; } -static int read_and_truncate_page(struct btrfs_path *path, - struct btrfs_file_extent_item *fi, - int start, int len, char *dest) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_fs_info *fs_info = leaf->fs_info; - u64 aligned_start = round_down(start, fs_info->sectorsize); - u8 extent_type; - char *buf; - int page_off = start - aligned_start; - int page_len = fs_info->sectorsize - page_off; - int ret; - - ASSERT(start + len <= aligned_start + fs_info->sectorsize); - buf = malloc_cache_aligned(fs_info->sectorsize); - if (!buf) - return -ENOMEM; - - extent_type = btrfs_file_extent_type(leaf, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = btrfs_read_extent_inline(path, fi, buf); - memcpy(dest, buf + page_off, min(page_len, ret)); - free(buf); - return len; - } - - ret = btrfs_read_extent_reg(path, fi, - round_down(start, fs_info->sectorsize), - fs_info->sectorsize, buf); - if (ret < 0) { - free(buf); - return ret; - } - memcpy(dest, buf + page_off, page_len); - free(buf); - return len; -} - int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, char *dest) { @@ -663,7 +625,6 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, struct btrfs_path path; struct btrfs_key key; u64 aligned_start = round_down(file_offset, fs_info->sectorsize); - u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize); u64 next_offset; u64 cur = aligned_start; int ret = 0; @@ -673,34 +634,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, /* Set the whole dest all zero, so we won't nee
[PATCH v3 5/8] fs: ext4: rely on _fs_read() to handle leading unaligned block read
Just add ext4_get_blocksize() and a new assert() in ext4fs_read_file(). Signed-off-by: Qu Wenruo --- Several cleanup candicate: 1. ext_fs->blksz is never populated, thus it's always 0 We can not easily grab blocksize just like btrfs in this case. Thus we have to go the same calculation in ext4fs_read_file(). 2. can not easily cleanup @skipfirst in ext4fs_read_file() It seems to screw up with soft link read. --- fs/ext4/ext4fs.c | 22 ++ fs/fs.c | 2 +- include/ext4fs.h | 1 + 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c index 4c89152ce4ad..b0568e77a895 100644 --- a/fs/ext4/ext4fs.c +++ b/fs/ext4/ext4fs.c @@ -69,6 +69,8 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos, short status; struct ext_block_cache cache; + assert(IS_ALIGNED(pos, blocksize)); + ext_cache_init(); /* Adjust len so it we can't read past the end of the file. */ @@ -259,6 +261,26 @@ int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ext4fs_read(buf, offset, len, len_read); } +int ext4_get_blocksize(const char *filename) +{ + struct ext_filesystem *fs; + int log2blksz; + int log2_fs_blocksize; + loff_t file_len; + int ret; + + ret = ext4fs_open(filename, _len); + if (ret < 0) { + printf("** File not found %s **\n", filename); + return -1; + } + fs = get_fs(); + log2blksz = fs->dev_desc->log2blksz; + log2_fs_blocksize = LOG2_BLOCK_SIZE(ext4fs_file->data) - log2blksz; + + return (1 << (log2_fs_blocksize + log2blksz)); +} + int ext4fs_uuid(char *uuid_str) { if (ext4fs_root == NULL) diff --git a/fs/fs.c b/fs/fs.c index 6d43d616de4b..b26cc8e2e840 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -237,7 +237,7 @@ static struct fstype_info fstypes[] = { .exists = ext4fs_exists, .size = ext4fs_size, .read = ext4_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ext4_get_blocksize, #ifdef CONFIG_CMD_EXT4_WRITE .write = ext4_write_file, .ln = ext4fs_create_link, diff --git a/include/ext4fs.h b/include/ext4fs.h index cb5d9cc0a5c0..0f4cf32dcc2a 100644 --- a/include/ext4fs.h +++ b/include/ext4fs.h @@ -161,6 +161,7 @@ int ext4fs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int ext4_get_blocksize(const char *filename); int ext4_read_superblock(char *buffer); int ext4fs_uuid(char *uuid_str); void ext_cache_init(struct ext_block_cache *cache); -- 2.37.1
[PATCH v3 3/8] fs: btrfs: fix a crash if specified range is beyond file size
[BUG] When try to read a range beyond file size, btrfs driver will cause crash/segfault: => load host 0 $kernel_addr_r 5k_file 0 0x2000 SEGFAULT [CAUSE] In btrfs_read(), if @len is 0, we will truncated it to file end, but if file end is beyond our file size, this truncation will underflow @len, making it -3K in this case. And later that @len is used to memzero the output buffer, resulting above crash. [FIX] Just error out if @offset is already beyond our file size. Now it will fail properly with correct error message: => load host 0 $kernel_addr_r 5m_origin 0 0x2000 BTRFS: Read range beyond file size, offset 8192 file size 5120 Failed to load '5m_origin' Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 309cd595d37d..74a992fa012d 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -252,6 +252,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return ret; } + if (offset >= real_size) { + error("Read range beyond file size, offset %llu file size %llu", + offset, real_size); + return -EINVAL; + } + /* * If the length is 0 (meaning read the whole file) or the range is * beyond file size, truncate it to the end of the file. -- 2.37.1
[PATCH v3 2/8] fs: btrfs: fix a bug which no data get read if the length is not 0
[BUG] When testing with unaligned read, if a specific length is passed in, btrfs driver will read out nothing: => load host 0 $kernel_addr_r 5k_file 0x1000 0 0 bytes read in 0 ms But if no length is passed in, it works fine, even if we pass a non-zero length: => load host 0 $kernel_addr_r 5k_file 0 0x1000 1024 bytes read in 0 ms [CAUSE] In btrfs_read() if we have a larger size than our file, we will try to truncate it using the file size. However the real file size is not initialized if @len is not zero, thus we always truncate our length to 0, and cause the problem. [FIX] Fix it by just always do the file size check. In fact btrfs_size() always follow soft link, thus it will return the real file size correctly. Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 4cdbbbe3d066..309cd595d37d 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -246,16 +246,17 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return -EINVAL; } - if (!len) { - ret = btrfs_size(file, _size); - if (ret < 0) { - error("Failed to get inode size: %s", file); - return ret; - } - len = real_size; + ret = btrfs_size(file, _size); + if (ret < 0) { + error("Failed to get inode size: %s", file); + return ret; } - if (len > real_size - offset) + /* +* If the length is 0 (meaning read the whole file) or the range is +* beyond file size, truncate it to the end of the file. +*/ + if (!len || len > real_size - offset) len = real_size - offset; ret = btrfs_file_read(root, ino, offset, len, buf); -- 2.37.1
[PATCH v3 1/8] fs: fat: unexport file_fat_read_at()
That function is only utilized inside fat driver, unexport it. Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 4 ++-- include/fat.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index df9ea2c028fc..dcceccbcee0a 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1243,8 +1243,8 @@ out_free_itr: return ret; } -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread) +static int file_fat_read_at(const char *filename, loff_t pos, void *buffer, + loff_t maxsize, loff_t *actread) { fsdata fsdata; fat_itr *itr; diff --git a/include/fat.h b/include/fat.h index bd8e450b33a3..a9756fb4cd1b 100644 --- a/include/fat.h +++ b/include/fat.h @@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect) int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread); int file_fat_read(const char *filename, void *buffer, int maxsize); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.37.1
[PATCH v3 0/8] U-boot: fs: add generic unaligned read offset handling
[CHANGELOG] v3: - Fix an error that we always return 0 actread bytes for unsupported fses For unsupported fses, we should also populate @total_read. Or we will just read the data but still return 0 for actually bytes. Now it pass all test_fs* cases. v2 - Fix a linkage error where (U64 % U32) is called without proper helper Fix it with U64 & (U32 - 1), as the U32 value (@blocksize) should always be power of 2, thus (@blocksize - 1) is the mask we want to calculate the offset inside the block. Above change only affects the 4th patch, everything else is not touched. RFC->v1: - More (manual) testing Unfortunately, in the latest master (75967970850a), the fs-tests.sh always seems to hang at preparing the fs image. Thus still has to do manual testing, tested btrfs, ext4 and fat, with aligned and unaligned read, also added soft link read, all looks fine here. Extra testing is still appreciated. - Two more btrfs specific bug fixes All exposed during manual tests - Remove the tailing unaligned block handling In fact, all fses can easily handle such case, just a min() call is enough. - Remove the support for sandboxfs Since it's using read() calls, really no need to do block alignment check. - Enhanced blocksize check Ensure the returned blocksize is not only non-error, but also non-zero. Qu Wenruo (8): fs: fat: unexport file_fat_read_at() fs: btrfs: fix a bug which no data get read if the length is not 0 fs: btrfs: fix a crash if specified range is beyond file size fs: btrfs: move the unaligned read code to _fs_read() for btrfs fs: ext4: rely on _fs_read() to handle leading unaligned block read fs: fat: rely on higher layer to get block aligned read range fs: ubifs: rely on higher layer to do unaligned read fs: erofs: add unaligned read range handling fs/btrfs/btrfs.c | 33 --- fs/btrfs/inode.c | 89 +++-- fs/erofs/internal.h | 1 + fs/erofs/super.c | 6 ++ fs/ext4/ext4fs.c | 22 +++ fs/fat/fat.c | 17 +- fs/fs.c | 130 +++--- fs/ubifs/ubifs.c | 13 +++-- include/btrfs.h | 1 + include/erofs.h | 1 + include/ext4fs.h | 1 + include/fat.h | 3 +- include/ubifs_uboot.h | 1 + 13 files changed, 212 insertions(+), 106 deletions(-) -- 2.37.1
Re: [PATCH v2 1/8] fs: fat: unexport file_fat_read_at()
On 2022/8/6 05:14, Tom Rini wrote: On Tue, Jul 26, 2022 at 01:22:09PM +0800, Qu Wenruo wrote: That function is only utilized inside fat driver, unexport it. Signed-off-by: Qu Wenruo Unfortunately, the series fails CI: https://source.denx.de/u-boot/u-boot/-/jobs/478838 OK, it's a bug in the unsupported fses (which squashfs doesn't support) The actual read bytes is not updated. Sorry for the inconvenience. Any idea that how to run the full tests locally so I can prevent such problem? Thanks, Qu
[PATCH v2 8/8] fs: erofs: add unaligned read range handling
I'm not an expert on erofs, but my quick glance didn't expose any special handling on unaligned range, thus I think the U-boot erofs driver doesn't really support unaligned read range. This patch will add erofs_get_blocksize() so erofs can benefit from the generic unaligned read support. Cc: Huang Jianan Cc: linux-er...@lists.ozlabs.org Signed-off-by: Qu Wenruo Reviewed-by: Huang Jianan --- fs/erofs/internal.h | 1 + fs/erofs/super.c| 6 ++ fs/fs.c | 2 +- include/erofs.h | 1 + 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 4af7c91560cc..d368a6481bf1 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -83,6 +83,7 @@ struct erofs_sb_info { u16 available_compr_algs; u16 lz4_max_distance; u32 checksum; + u32 blocksize; u16 extra_devices; union { u16 devt_slotoff; /* used for mkfs */ diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 4cca322b9ead..df01d2e719a7 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -99,7 +99,13 @@ int erofs_read_superblock(void) sbi.build_time = le64_to_cpu(dsb->build_time); sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec); + sbi.blocksize = 1 << blkszbits; memcpy(, dsb->uuid, sizeof(dsb->uuid)); return erofs_init_devices(, dsb); } + +int erofs_get_blocksize(const char *filename) +{ + return sbi.blocksize; +} diff --git a/fs/fs.c b/fs/fs.c index 2b847d83597b..23cfb1f5025b 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -375,7 +375,7 @@ static struct fstype_info fstypes[] = { .readdir = erofs_readdir, .ls = fs_ls_generic, .read = erofs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = erofs_get_blocksize, .size = erofs_size, .close = erofs_close, .closedir = erofs_closedir, diff --git a/include/erofs.h b/include/erofs.h index 1fbe82bf72cb..18bd6807c538 100644 --- a/include/erofs.h +++ b/include/erofs.h @@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int erofs_read(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int erofs_get_blocksize(const char *filename); int erofs_size(const char *filename, loff_t *size); int erofs_exists(const char *filename); void erofs_close(void); -- 2.37.0
[PATCH v2 7/8] fs: ubifs: rely on higher layer to do unaligned read
Currently ubifs doesn't support unaligned read offset, thanks to the recent _fs_read() work to handle unaligned read, we only need to implement ubifs_get_blocksize() to take advantage of it. Now ubifs can do unaligned read without any problem. Signed-off-by: Qu Wenruo --- fs/fs.c | 2 +- fs/ubifs/ubifs.c | 13 - include/ubifs_uboot.h | 1 + 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/fs.c b/fs/fs.c index 3eb540c5fe30..2b847d83597b 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -311,7 +311,7 @@ static struct fstype_info fstypes[] = { .exists = ubifs_exists, .size = ubifs_size, .read = ubifs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ubifs_get_blocksize, .write = fs_write_unsupported, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c index d3026e310168..a8ab556dd376 100644 --- a/fs/ubifs/ubifs.c +++ b/fs/ubifs/ubifs.c @@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t offset, *actread = 0; - if (offset & (PAGE_SIZE - 1)) { - printf("ubifs: Error offset must be a multiple of %d\n", - PAGE_SIZE); - return -1; - } + /* Higher layer should ensure it always pass page aligned range. */ + assert(IS_ALIGNED(offset, PAGE_SIZE)); + assert(IS_ALIGNED(size, PAGE_SIZE)); c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY); /* ubifs_findfile will resolve symlinks, so we know that we get @@ -920,6 +918,11 @@ out: return err; } +int ubifs_get_blocksize(const char *filename) +{ + return PAGE_SIZE; +} + void ubifs_close(void) { } diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h index b025779d59ff..bcd21715314a 100644 --- a/include/ubifs_uboot.h +++ b/include/ubifs_uboot.h @@ -29,6 +29,7 @@ int ubifs_exists(const char *filename); int ubifs_size(const char *filename, loff_t *size); int ubifs_read(const char *filename, void *buf, loff_t offset, loff_t size, loff_t *actread); +int ubifs_get_blocksize(const char *filename); void ubifs_close(void); #endif /* __UBIFS_UBOOT_H__ */ -- 2.37.0
[PATCH v2 6/8] fs: fat: rely on higher layer to get block aligned read range
Just implement fat_get_blocksize() for fat, so that fat_read_file() always get a block aligned read range. Unfortunately I'm not experienced enough to cleanup the fat code, thus further cleanup is appreciated. Cc: Tom Rini Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 13 + fs/fs.c | 2 +- include/fat.h | 1 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index dcceccbcee0a..e13035e8e6d1 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ret; } +int fat_get_blocksize(const char *filename) +{ + fsdata fsdata = {0}; + int ret; + + ret = get_fs_info(); + if (ret) + return ret; + + free(fsdata.fatbuf); + return fsdata.sect_size; +} + typedef struct { struct fs_dir_stream parent; struct fs_dirent dirent; diff --git a/fs/fs.c b/fs/fs.c index 3d6cc6b38b26..3eb540c5fe30 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -207,7 +207,7 @@ static struct fstype_info fstypes[] = { .exists = fat_exists, .size = fat_size, .read = fat_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = fat_get_blocksize, #if CONFIG_IS_ENABLED(FAT_WRITE) .write = file_fat_write, .unlink = fat_unlink, diff --git a/include/fat.h b/include/fat.h index a9756fb4cd1b..c03a2bebecef 100644 --- a/include/fat.h +++ b/include/fat.h @@ -201,6 +201,7 @@ int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); int file_fat_read(const char *filename, void *buffer, int maxsize); +int fat_get_blocksize(const char *filename); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.37.0
[PATCH v2 5/8] fs: ext4: rely on _fs_read() to handle leading unaligned block read
Just add ext4_get_blocksize() and a new assert() in ext4fs_read_file(). Signed-off-by: Qu Wenruo --- fs/ext4/ext4fs.c | 22 ++ fs/fs.c | 2 +- include/ext4fs.h | 1 + 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c index 4c89152ce4ad..b0568e77a895 100644 --- a/fs/ext4/ext4fs.c +++ b/fs/ext4/ext4fs.c @@ -69,6 +69,8 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos, short status; struct ext_block_cache cache; + assert(IS_ALIGNED(pos, blocksize)); + ext_cache_init(); /* Adjust len so it we can't read past the end of the file. */ @@ -259,6 +261,26 @@ int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ext4fs_read(buf, offset, len, len_read); } +int ext4_get_blocksize(const char *filename) +{ + struct ext_filesystem *fs; + int log2blksz; + int log2_fs_blocksize; + loff_t file_len; + int ret; + + ret = ext4fs_open(filename, _len); + if (ret < 0) { + printf("** File not found %s **\n", filename); + return -1; + } + fs = get_fs(); + log2blksz = fs->dev_desc->log2blksz; + log2_fs_blocksize = LOG2_BLOCK_SIZE(ext4fs_file->data) - log2blksz; + + return (1 << (log2_fs_blocksize + log2blksz)); +} + int ext4fs_uuid(char *uuid_str) { if (ext4fs_root == NULL) diff --git a/fs/fs.c b/fs/fs.c index 1e9f778e1f11..3d6cc6b38b26 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -236,7 +236,7 @@ static struct fstype_info fstypes[] = { .exists = ext4fs_exists, .size = ext4fs_size, .read = ext4_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ext4_get_blocksize, #ifdef CONFIG_CMD_EXT4_WRITE .write = ext4_write_file, .ln = ext4fs_create_link, diff --git a/include/ext4fs.h b/include/ext4fs.h index cb5d9cc0a5c0..0f4cf32dcc2a 100644 --- a/include/ext4fs.h +++ b/include/ext4fs.h @@ -161,6 +161,7 @@ int ext4fs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int ext4_get_blocksize(const char *filename); int ext4_read_superblock(char *buffer); int ext4fs_uuid(char *uuid_str); void ext_cache_init(struct ext_block_cache *cache); -- 2.37.0
[PATCH v2 4/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code to handle the unaligned read (aka, read range is not aligned to fs block size). This makes underlying fs code harder to implement, as they have to handle unaligned read all by themselves. This patch will change the behavior, starting from btrfs, by moving the unaligned read code into _fs_read(). The idea is pretty simple, if we have an unaligned read request, we handle it in the following steps: 1. Grab the blocksize of the fs 2. Read the leading unaligned range We will read the block that @offset is in, and copy the requested part into buf. The the block we read covers the whole range, we just call it a day. 3. Read the remaining part The tailing part may be unaligned, but all fses handles the tailing part much easier than the leading unaligned part. As they just need to do a min(extent_size, start + len - cur) to calculate the real read size. In fact, for most file reading, the file size is not aligned and we need to handle the tailing part anyway. There is a btrfs specific cleanup involved: - In btrfs_file_read(), merge the tailing unaligned read into the main loop. Just reuse the existing read length calculation is enough. - Remove read_and_truncate_page() call Since there is no explicit leading/tailing unaligned read anymore. This has been tested with a proper randomly populated btrfs file, then tried in sandbox mode with different aligned and unaligned range and compare the output with md5sum. Cc: Marek Behun Cc: linux-bt...@vger.kernel.org Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 10 fs/btrfs/inode.c | 89 +++- fs/fs.c | 130 --- include/btrfs.h | 1 + 4 files changed, 141 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index bf9e1f2f17cf..7c8f4a3dfb87 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -234,6 +234,10 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, int ret; ASSERT(fs_info); + + /* Higher layer has ensures it never pass unaligned offset in. */ + ASSERT(IS_ALIGNED(offset, fs_info->sectorsize)); + ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID, file, , , , 40); if (ret < 0) { @@ -275,6 +279,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return 0; } +int btrfs_get_blocksize(const char *filename) +{ + ASSERT(current_fs_info); + return current_fs_info->sectorsize; +} + void btrfs_close(void) { if (current_fs_info) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0173d30cd8ab..aa198c5aaf1f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -617,44 +617,6 @@ check_next: return 1; } -static int read_and_truncate_page(struct btrfs_path *path, - struct btrfs_file_extent_item *fi, - int start, int len, char *dest) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_fs_info *fs_info = leaf->fs_info; - u64 aligned_start = round_down(start, fs_info->sectorsize); - u8 extent_type; - char *buf; - int page_off = start - aligned_start; - int page_len = fs_info->sectorsize - page_off; - int ret; - - ASSERT(start + len <= aligned_start + fs_info->sectorsize); - buf = malloc_cache_aligned(fs_info->sectorsize); - if (!buf) - return -ENOMEM; - - extent_type = btrfs_file_extent_type(leaf, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = btrfs_read_extent_inline(path, fi, buf); - memcpy(dest, buf + page_off, min(page_len, ret)); - free(buf); - return len; - } - - ret = btrfs_read_extent_reg(path, fi, - round_down(start, fs_info->sectorsize), - fs_info->sectorsize, buf); - if (ret < 0) { - free(buf); - return ret; - } - memcpy(dest, buf + page_off, page_len); - free(buf); - return len; -} - int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, char *dest) { @@ -663,7 +625,6 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, struct btrfs_path path; struct btrfs_key key; u64 aligned_start = round_down(file_offset, fs_info->sectorsize); - u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize); u64 next_offset; u64 cur = aligned_start; int ret = 0; @@ -673,34 +634,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, /* Set the whole dest all zero, so we won't nee
[PATCH v2 3/8] fs: btrfs: fix a crash if specified range is beyond file size
[BUG] When try to read a range beyond file size, btrfs driver will cause crash/segfault: => load host 0 $kernel_addr_r 5k_file 0 0x2000 SEGFAULT [CAUSE] In btrfs_read(), if @len is 0, we will truncated it to file end, but if file end is beyond our file size, this truncation will underflow @len, making it -3K in this case. And later that @len is used to memzero the output buffer, resulting above crash. [FIX] Just error out if @offset is already beyond our file size. Now it will fail properly with correct error message: => load host 0 $kernel_addr_r 5m_origin 0 0x2000 BTRFS: Read range beyond file size, offset 8192 file size 5120 Failed to load '5m_origin' Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 9145727058d4..bf9e1f2f17cf 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -252,6 +252,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return ret; } + if (offset >= real_size) { + error("Read range beyond file size, offset %llu file size %llu", + offset, real_size); + return -EINVAL; + } + /* * If the length is 0 (meaning read the whole file) or the range is * beyond file size, truncate it to the end of the file. -- 2.37.0
[PATCH v2 2/8] fs: btrfs: fix a bug which no data get read if the length is not 0
[BUG] When testing with unaligned read, if a specific length is passed in, btrfs driver will read out nothing: => load host 0 $kernel_addr_r 5k_file 0x1000 0 0 bytes read in 0 ms But if no length is passed in, it works fine, even if we pass a non-zero length: => load host 0 $kernel_addr_r 5k_file 0 0x1000 1024 bytes read in 0 ms [CAUSE] In btrfs_read() if we have a larger size than our file, we will try to truncate it using the file size. However the real file size is not initialized if @len is not zero, thus we always truncate our length to 0, and cause the problem. [FIX] Fix it by just always do the file size check. In fact btrfs_size() always follow soft link, thus it will return the real file size correctly. Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 741c6e20f533..9145727058d4 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -246,16 +246,17 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return -EINVAL; } - if (!len) { - ret = btrfs_size(file, _size); - if (ret < 0) { - error("Failed to get inode size: %s", file); - return ret; - } - len = real_size; + ret = btrfs_size(file, _size); + if (ret < 0) { + error("Failed to get inode size: %s", file); + return ret; } - if (len > real_size - offset) + /* +* If the length is 0 (meaning read the whole file) or the range is +* beyond file size, truncate it to the end of the file. +*/ + if (!len || len > real_size - offset) len = real_size - offset; ret = btrfs_file_read(root, ino, offset, len, buf); -- 2.37.0
[PATCH v2 1/8] fs: fat: unexport file_fat_read_at()
That function is only utilized inside fat driver, unexport it. Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 4 ++-- include/fat.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index df9ea2c028fc..dcceccbcee0a 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1243,8 +1243,8 @@ out_free_itr: return ret; } -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread) +static int file_fat_read_at(const char *filename, loff_t pos, void *buffer, + loff_t maxsize, loff_t *actread) { fsdata fsdata; fat_itr *itr; diff --git a/include/fat.h b/include/fat.h index bd8e450b33a3..a9756fb4cd1b 100644 --- a/include/fat.h +++ b/include/fat.h @@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect) int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread); int file_fat_read(const char *filename, void *buffer, int maxsize); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.37.0
[PATCH v2 0/8] U-boot: fs: add generic unaligned read offset handling
[CHANGELOG] v2->v1: - Fix a linkage error where (U64 % U32) is called without proper helper Fix it with U64 & (U32 - 1), as the U32 value (@blocksize) should always be power of 2, thus (@blocksize - 1) is the mask we want to calculate the offset inside the block. Above change only affects the 4th patch, everything else is not touched. RFC->v1: - More (manual) testing Unfortunately, in the latest master (75967970850a), the fs-tests.sh always seems to hang at preparing the fs image. Thus still has to do manual testing, tested btrfs, ext4 and fat, with aligned and unaligned read, also added soft link read, all looks fine here. Extra testing is still appreciated. - Two more btrfs specific bug fixes All exposed during manual tests - Remove the tailing unaligned block handling In fact, all fses can easily handle such case, just a min() call is enough. - Remove the support for sandboxfs Since it's using read() calls, really no need to do block alignment check. - Enhanced blocksize check Ensure the returned blocksize is not only non-error, but also non-zero. This patchset can be fetched from github: https://github.com/adam900710/u-boot/tree/fs_unaligned_read [BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses. Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size). But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case. [ADVANTAGE] This patchset will handle unaligned read range in _fs_read(): - Get blocksize of the underlying fs - Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer. If the first block covers the whole range, we just call it aday. - Read the remaining range which starts at block aligned offset For most common case, which is 0 offset and 0 len, the code will not be changed at all. Just one extra get_blocksize() call, and for FAT/Btrfs/EROFS they all have cached blocksize, thus it takes almost no extra cost. Although for EXT4, it doesn't seem to cache the blocksize globally, thus has to do a path resolve and grab the blocksize. [DISADVANTAGE] The involved problem is: - Extra path resolving All those supported fs may have to do one extra path resolve if the read offset is not aligned. For EXT4, it will do one extra path resolve just to grab the blocksize. For data read which starts at offset 0 (the most common case), it should cause *NO* difference in performance. As the extra handling is only for unaligned offset. The common path is not really modified. [SUPPORTED FSES] - Btrfs (manually tested*) - Ext4 (manually tested) - FAT (manually tested) - Erofs - ubifs (unable to test, due to compile failure) *: Failed to get the test cases run, thus have to go sandbox mode, and attach an image with target fs, load the target file (with unaligned range) and compare the result using md5sum. For EXT4/FAT, they may need extra cleanup, as their existing unaligned range handling is no longer needed anymore, cleaning them up should free more code lines than the added one. Just not confident enough to modify them all by myself. [UNSUPPORTED FSES] - Squashfs They don't support non-zero offset, thus it can not handle the block aligned range. Need extra help to add block aligned offset support. - Semihostfs - Sandboxfs They all use read() directly, no need to do alignment check at all. Extra testing/feedback is always appreciated. Qu Wenruo (8): fs: fat: unexport file_fat_read_at() fs: btrfs: fix a bug which no data get read if the length is not 0 fs: btrfs: fix a crash if specified range is beyond file size fs: btrfs: move the unaligned read code to _fs_read() for btrfs fs: ext4: rely on _fs_read() to handle leading unaligned block read fs: fat: rely on higher layer to get block aligned read range fs: ubifs: rely on higher layer to do unaligned read fs: erofs: add unaligned read range handling fs/btrfs/btrfs.c | 33 --- fs/btrfs/inode.c | 89 +++-- fs/erofs/internal.h | 1 + fs/erofs/super.c | 6 ++ fs/ext4/ext4fs.c | 22 +++ fs/fat/fat.c | 17 +- fs/fs.c | 130 +++--- fs/ubifs/ubifs.c | 13 +++-- include/btrfs.h | 1 + include/erofs.h | 1 + include/ext4fs.h | 1 + include/fat.h | 3 +- include/ubifs_uboot.h | 1 + 13 files changed, 212 insertions(+), 106 deletions(-) -- 2.37.0
Re: [PATCH 1/8] fs: fat: unexport file_fat_read_at()
On 2022/7/26 06:28, Tom Rini wrote: On Wed, Jun 29, 2022 at 07:38:22PM +0800, Qu Wenruo wrote: That function is only utilized inside fat driver, unexport it. Signed-off-by: Qu Wenruo The series has a fails to build on nokia_rx51: https://source.denx.de/u-boot/u-boot/-/jobs/471877#L483 which to me says doing 64bit division (likely related to block size, etc) without using the appropriate helper macros to turn them in to bit shifts instead. Should I update and resend the series or just send the incremental update to fix the U64/U32 division? Thanks, Qu
Re: [PATCH RFC 6/8] fs: sandboxfs: add sandbox_fs_get_blocksize()
On 2022/6/30 18:06, Simon Glass wrote: On Tue, 28 Jun 2022 at 01:28, Qu Wenruo wrote: This is to make sandboxfs to report blocksize it supports for _fs_read() to handle unaligned read. Unlike all other fses, sandboxfs can handle unaligned read/write without any problem since it's calling read()/write(), which doesn't bother the blocksize at all. This change is mostly to make testing of _fs_read() much easier. Cc: Simon Glass Signed-off-by: Qu Wenruo --- arch/sandbox/cpu/os.c | 11 +++ fs/fs.c| 2 +- fs/sandbox/sandboxfs.c | 14 ++ include/os.h | 8 include/sandboxfs.h| 1 + 5 files changed, 35 insertions(+), 1 deletion(-) Reviewed-by: Simon Glass with a comment as requested below diff --git a/arch/sandbox/cpu/os.c b/arch/sandbox/cpu/os.c index 5ea54179176c..6c29f29bdd9b 100644 --- a/arch/sandbox/cpu/os.c +++ b/arch/sandbox/cpu/os.c @@ -46,6 +46,17 @@ ssize_t os_read(int fd, void *buf, size_t count) return read(fd, buf, count); } +ssize_t os_get_blocksize(int fd) +{ + struct stat stat = {0}; + int ret; + + ret = fstat(fd, ); + if (ret < 0) + return -errno; + return stat.st_blksize; +} + ssize_t os_write(int fd, const void *buf, size_t count) { return write(fd, buf, count); diff --git a/fs/fs.c b/fs/fs.c index 7e4ead9b790b..337d5711c28c 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -261,7 +261,7 @@ static struct fstype_info fstypes[] = { .exists = sandbox_fs_exists, .size = sandbox_fs_size, .read = fs_read_sandbox, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = sandbox_fs_get_blocksize, .write = fs_write_sandbox, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/sandbox/sandboxfs.c b/fs/sandbox/sandboxfs.c index 4ae41d5b4db1..130fee088621 100644 --- a/fs/sandbox/sandboxfs.c +++ b/fs/sandbox/sandboxfs.c @@ -55,6 +55,20 @@ int sandbox_fs_read_at(const char *filename, loff_t pos, void *buffer, return ret; } +int sandbox_fs_get_blocksize(const char *filename) +{ + int fd; + int ret; + + fd = os_open(filename, OS_O_RDONLY); + if (fd < 0) + return fd; + + ret = os_get_blocksize(fd); + os_close(fd); + return ret; +} + int sandbox_fs_write_at(const char *filename, loff_t pos, void *buffer, loff_t towrite, loff_t *actwrite) { diff --git a/include/os.h b/include/os.h index 10e198cf503e..a864d9ca39b2 100644 --- a/include/os.h +++ b/include/os.h @@ -26,6 +26,14 @@ struct sandbox_state; */ ssize_t os_read(int fd, void *buf, size_t count); +/** + * Get the optimial blocksize through stat() call. + * + * @fd:File descriptor as returned by os_open() + * Return: >=0 for the blocksize. <0 for error. + */ +ssize_t os_get_blocksize(int fd); + /** * Access to the OS write() system call * diff --git a/include/sandboxfs.h b/include/sandboxfs.h index 783dd5c88a73..6937068f7b82 100644 --- a/include/sandboxfs.h +++ b/include/sandboxfs.h @@ -32,6 +32,7 @@ void sandbox_fs_close(void); int sandbox_fs_ls(const char *dirname); int sandbox_fs_exists(const char *filename); int sandbox_fs_size(const char *filename, loff_t *size); +int sandbox_fs_get_blocksize(const char *filename); Please add a full comment. This is already removed in the formal version: https://patchwork.kernel.org/project/linux-btrfs/list/?series=654990 As sandbox is just calling host OS read() to handle IO, thus it doesn't need to bother the alignment at all. Thanks, Qu int fs_read_sandbox(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); int fs_write_sandbox(const char *filename, void *buf, loff_t offset, -- 2.36.1 Regards, Simon
[PATCH 7/8] fs: ubifs: rely on higher layer to do unaligned read
Currently ubifs doesn't support unaligned read offset, thanks to the recent _fs_read() work to handle unaligned read, we only need to implement ubifs_get_blocksize() to take advantage of it. Now ubifs can do unaligned read without any problem. Signed-off-by: Qu Wenruo --- Unfortunately I can not test ubifs, as enabling UBI would cause compile failure due to missing of header. --- fs/fs.c | 2 +- fs/ubifs/ubifs.c | 13 - include/ubifs_uboot.h | 1 + 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/fs.c b/fs/fs.c index 09625f52e913..61bae1051406 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -311,7 +311,7 @@ static struct fstype_info fstypes[] = { .exists = ubifs_exists, .size = ubifs_size, .read = ubifs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ubifs_get_blocksize, .write = fs_write_unsupported, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c index d3026e310168..a8ab556dd376 100644 --- a/fs/ubifs/ubifs.c +++ b/fs/ubifs/ubifs.c @@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t offset, *actread = 0; - if (offset & (PAGE_SIZE - 1)) { - printf("ubifs: Error offset must be a multiple of %d\n", - PAGE_SIZE); - return -1; - } + /* Higher layer should ensure it always pass page aligned range. */ + assert(IS_ALIGNED(offset, PAGE_SIZE)); + assert(IS_ALIGNED(size, PAGE_SIZE)); c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY); /* ubifs_findfile will resolve symlinks, so we know that we get @@ -920,6 +918,11 @@ out: return err; } +int ubifs_get_blocksize(const char *filename) +{ + return PAGE_SIZE; +} + void ubifs_close(void) { } diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h index b025779d59ff..bcd21715314a 100644 --- a/include/ubifs_uboot.h +++ b/include/ubifs_uboot.h @@ -29,6 +29,7 @@ int ubifs_exists(const char *filename); int ubifs_size(const char *filename, loff_t *size); int ubifs_read(const char *filename, void *buf, loff_t offset, loff_t size, loff_t *actread); +int ubifs_get_blocksize(const char *filename); void ubifs_close(void); #endif /* __UBIFS_UBOOT_H__ */ -- 2.36.1
[PATCH 8/8] fs: erofs: add unaligned read range handling
I'm not an expert on erofs, but my quick glance didn't expose any special handling on unaligned range, thus I think the U-boot erofs driver doesn't really support unaligned read range. This patch will add erofs_get_blocksize() so erofs can benefit from the generic unaligned read support. Cc: Huang Jianan Cc: linux-er...@lists.ozlabs.org Signed-off-by: Qu Wenruo --- fs/erofs/internal.h | 1 + fs/erofs/super.c| 6 ++ fs/fs.c | 2 +- include/erofs.h | 1 + 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 4af7c91560cc..d368a6481bf1 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -83,6 +83,7 @@ struct erofs_sb_info { u16 available_compr_algs; u16 lz4_max_distance; u32 checksum; + u32 blocksize; u16 extra_devices; union { u16 devt_slotoff; /* used for mkfs */ diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 4cca322b9ead..df01d2e719a7 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -99,7 +99,13 @@ int erofs_read_superblock(void) sbi.build_time = le64_to_cpu(dsb->build_time); sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec); + sbi.blocksize = 1 << blkszbits; memcpy(, dsb->uuid, sizeof(dsb->uuid)); return erofs_init_devices(, dsb); } + +int erofs_get_blocksize(const char *filename) +{ + return sbi.blocksize; +} diff --git a/fs/fs.c b/fs/fs.c index 61bae1051406..e92174d89c28 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -375,7 +375,7 @@ static struct fstype_info fstypes[] = { .readdir = erofs_readdir, .ls = fs_ls_generic, .read = erofs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = erofs_get_blocksize, .size = erofs_size, .close = erofs_close, .closedir = erofs_closedir, diff --git a/include/erofs.h b/include/erofs.h index 1fbe82bf72cb..18bd6807c538 100644 --- a/include/erofs.h +++ b/include/erofs.h @@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int erofs_read(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int erofs_get_blocksize(const char *filename); int erofs_size(const char *filename, loff_t *size); int erofs_exists(const char *filename); void erofs_close(void); -- 2.36.1
[PATCH 5/8] fs: ext4: rely on _fs_read() to handle leading unaligned block read
Just add ext4_get_blocksize() and a new assert() in ext4fs_read_file(). Signed-off-by: Qu Wenruo --- Several cleanup candicate: 1. ext_fs->blksz is never populated, thus it's always 0 We can not easily grab blocksize just like btrfs in this case. Thus we have to go the same calculation in ext4fs_read_file(). 2. can not easily cleanup @skipfirst in ext4fs_read_file() It seems to screw up with soft link read. --- fs/ext4/ext4fs.c | 22 ++ fs/fs.c | 2 +- include/ext4fs.h | 1 + 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c index 4c89152ce4ad..b0568e77a895 100644 --- a/fs/ext4/ext4fs.c +++ b/fs/ext4/ext4fs.c @@ -69,6 +69,8 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos, short status; struct ext_block_cache cache; + assert(IS_ALIGNED(pos, blocksize)); + ext_cache_init(); /* Adjust len so it we can't read past the end of the file. */ @@ -259,6 +261,26 @@ int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ext4fs_read(buf, offset, len, len_read); } +int ext4_get_blocksize(const char *filename) +{ + struct ext_filesystem *fs; + int log2blksz; + int log2_fs_blocksize; + loff_t file_len; + int ret; + + ret = ext4fs_open(filename, _len); + if (ret < 0) { + printf("** File not found %s **\n", filename); + return -1; + } + fs = get_fs(); + log2blksz = fs->dev_desc->log2blksz; + log2_fs_blocksize = LOG2_BLOCK_SIZE(ext4fs_file->data) - log2blksz; + + return (1 << (log2_fs_blocksize + log2blksz)); +} + int ext4fs_uuid(char *uuid_str) { if (ext4fs_root == NULL) diff --git a/fs/fs.c b/fs/fs.c index f9df8d9ec9a4..3cd2c88a4895 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -236,7 +236,7 @@ static struct fstype_info fstypes[] = { .exists = ext4fs_exists, .size = ext4fs_size, .read = ext4_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ext4_get_blocksize, #ifdef CONFIG_CMD_EXT4_WRITE .write = ext4_write_file, .ln = ext4fs_create_link, diff --git a/include/ext4fs.h b/include/ext4fs.h index cb5d9cc0a5c0..0f4cf32dcc2a 100644 --- a/include/ext4fs.h +++ b/include/ext4fs.h @@ -161,6 +161,7 @@ int ext4fs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int ext4_get_blocksize(const char *filename); int ext4_read_superblock(char *buffer); int ext4fs_uuid(char *uuid_str); void ext_cache_init(struct ext_block_cache *cache); -- 2.36.1
[PATCH 4/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code to handle the unaligned read (aka, read range is not aligned to fs block size). This makes underlying fs code harder to implement, as they have to handle unaligned read all by themselves. This patch will change the behavior, starting from btrfs, by moving the unaligned read code into _fs_read(). The idea is pretty simple, if we have an unaligned read request, we handle it in the following steps: 1. Grab the blocksize of the fs 2. Read the leading unaligned range We will read the block that @offset is in, and copy the requested part into buf. The the block we read covers the whole range, we just call it a day. 3. Read the remaining part The tailing part may be unaligned, but all fses handles the tailing part much easier than the leading unaligned part. As they just need to do a min(extent_size, start + len - cur) to calculate the real read size. In fact, for most file reading, the file size is not aligned and we need to handle the tailing part anyway. There is a btrfs specific cleanup involved: - In btrfs_file_read(), merge the tailing unaligned read into the main loop. Just reuse the existing read length calculation is enough. - Remove read_and_truncate_page() call Since there is no explicit leading/tailing unaligned read anymore. This has been tested with a proper randomly populated btrfs file, then tried in sandbox mode with different aligned and unaligned range and compare the output with md5sum. Cc: Marek Behun Cc: linux-bt...@vger.kernel.org Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 10 fs/btrfs/inode.c | 89 +++- fs/fs.c | 130 --- include/btrfs.h | 1 + 4 files changed, 141 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index bf9e1f2f17cf..7c8f4a3dfb87 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -234,6 +234,10 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, int ret; ASSERT(fs_info); + + /* Higher layer has ensures it never pass unaligned offset in. */ + ASSERT(IS_ALIGNED(offset, fs_info->sectorsize)); + ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID, file, , , , 40); if (ret < 0) { @@ -275,6 +279,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return 0; } +int btrfs_get_blocksize(const char *filename) +{ + ASSERT(current_fs_info); + return current_fs_info->sectorsize; +} + void btrfs_close(void) { if (current_fs_info) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d00b5153336d..bef0a972f40d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -620,44 +620,6 @@ check_next: return 1; } -static int read_and_truncate_page(struct btrfs_path *path, - struct btrfs_file_extent_item *fi, - int start, int len, char *dest) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_fs_info *fs_info = leaf->fs_info; - u64 aligned_start = round_down(start, fs_info->sectorsize); - u8 extent_type; - char *buf; - int page_off = start - aligned_start; - int page_len = fs_info->sectorsize - page_off; - int ret; - - ASSERT(start + len <= aligned_start + fs_info->sectorsize); - buf = malloc_cache_aligned(fs_info->sectorsize); - if (!buf) - return -ENOMEM; - - extent_type = btrfs_file_extent_type(leaf, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = btrfs_read_extent_inline(path, fi, buf); - memcpy(dest, buf + page_off, min(page_len, ret)); - free(buf); - return len; - } - - ret = btrfs_read_extent_reg(path, fi, - round_down(start, fs_info->sectorsize), - fs_info->sectorsize, buf); - if (ret < 0) { - free(buf); - return ret; - } - memcpy(dest, buf + page_off, page_len); - free(buf); - return len; -} - int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, char *dest) { @@ -666,7 +628,6 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, struct btrfs_path path; struct btrfs_key key; u64 aligned_start = round_down(file_offset, fs_info->sectorsize); - u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize); u64 next_offset; u64 cur = aligned_start; int ret = 0; @@ -676,34 +637,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, /* Set the whole dest all zero, so we won't nee
[PATCH 6/8] fs: fat: rely on higher layer to get block aligned read range
Just implement fat_get_blocksize() for fat, so that fat_read_file() always get a block aligned read range. Unfortunately I'm not experienced enough to cleanup the fat code, thus further cleanup is appreciated. Cc: Tom Rini Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 13 + fs/fs.c | 2 +- include/fat.h | 1 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index dcceccbcee0a..e13035e8e6d1 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ret; } +int fat_get_blocksize(const char *filename) +{ + fsdata fsdata = {0}; + int ret; + + ret = get_fs_info(); + if (ret) + return ret; + + free(fsdata.fatbuf); + return fsdata.sect_size; +} + typedef struct { struct fs_dir_stream parent; struct fs_dirent dirent; diff --git a/fs/fs.c b/fs/fs.c index 3cd2c88a4895..09625f52e913 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -207,7 +207,7 @@ static struct fstype_info fstypes[] = { .exists = fat_exists, .size = fat_size, .read = fat_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = fat_get_blocksize, #if CONFIG_IS_ENABLED(FAT_WRITE) .write = file_fat_write, .unlink = fat_unlink, diff --git a/include/fat.h b/include/fat.h index a9756fb4cd1b..c03a2bebecef 100644 --- a/include/fat.h +++ b/include/fat.h @@ -201,6 +201,7 @@ int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); int file_fat_read(const char *filename, void *buffer, int maxsize); +int fat_get_blocksize(const char *filename); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.36.1
[PATCH 3/8] fs: btrfs: fix a crash if specified range is beyond file size
[BUG] When try to read a range beyond file size, btrfs driver will cause crash/segfault: => load host 0 $kernel_addr_r 5k_file 0 0x2000 SEGFAULT [CAUSE] In btrfs_read(), if @len is 0, we will truncated it to file end, but if file end is beyond our file size, this truncation will underflow @len, making it -3K in this case. And later that @len is used to memzero the output buffer, resulting above crash. [FIX] Just error out if @offset is already beyond our file size. Now it will fail properly with correct error message: => load host 0 $kernel_addr_r 5m_origin 0 0x2000 BTRFS: Read range beyond file size, offset 8192 file size 5120 Failed to load '5m_origin' Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 9145727058d4..bf9e1f2f17cf 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -252,6 +252,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return ret; } + if (offset >= real_size) { + error("Read range beyond file size, offset %llu file size %llu", + offset, real_size); + return -EINVAL; + } + /* * If the length is 0 (meaning read the whole file) or the range is * beyond file size, truncate it to the end of the file. -- 2.36.1
[PATCH 2/8] fs: btrfs: fix a bug which no data get read if the length is not 0
[BUG] When testing with unaligned read, if a specific length is passed in, btrfs driver will read out nothing: => load host 0 $kernel_addr_r 5k_file 0x1000 0 0 bytes read in 0 ms But if no length is passed in, it works fine, even if we pass a non-zero length: => load host 0 $kernel_addr_r 5k_file 0 0x1000 1024 bytes read in 0 ms [CAUSE] In btrfs_read() if we have a larger size than our file, we will try to truncate it using the file size. However the real file size is not initialized if @len is not zero, thus we always truncate our length to 0, and cause the problem. [FIX] Fix it by just always do the file size check. In fact btrfs_size() always follow soft link, thus it will return the real file size correctly. Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 741c6e20f533..9145727058d4 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -246,16 +246,17 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return -EINVAL; } - if (!len) { - ret = btrfs_size(file, _size); - if (ret < 0) { - error("Failed to get inode size: %s", file); - return ret; - } - len = real_size; + ret = btrfs_size(file, _size); + if (ret < 0) { + error("Failed to get inode size: %s", file); + return ret; } - if (len > real_size - offset) + /* +* If the length is 0 (meaning read the whole file) or the range is +* beyond file size, truncate it to the end of the file. +*/ + if (!len || len > real_size - offset) len = real_size - offset; ret = btrfs_file_read(root, ino, offset, len, buf); -- 2.36.1
[PATCH 1/8] fs: fat: unexport file_fat_read_at()
That function is only utilized inside fat driver, unexport it. Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 4 ++-- include/fat.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index df9ea2c028fc..dcceccbcee0a 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1243,8 +1243,8 @@ out_free_itr: return ret; } -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread) +static int file_fat_read_at(const char *filename, loff_t pos, void *buffer, + loff_t maxsize, loff_t *actread) { fsdata fsdata; fat_itr *itr; diff --git a/include/fat.h b/include/fat.h index bd8e450b33a3..a9756fb4cd1b 100644 --- a/include/fat.h +++ b/include/fat.h @@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect) int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread); int file_fat_read(const char *filename, void *buffer, int maxsize); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.36.1
[PATCH 0/8] U-boot: fs: add generic unaligned read offset handling
[CHANGELOG] RFC->v1: - More (manual) testing Unfortunately, in the latest master (75967970850a), the fs-tests.sh always seems to hang at preparing the fs image. Thus still has to do manual testing, tested btrfs, ext4 and fat, with aligned and unaligned read, also added soft link read, all looks fine here. Extra testing is still appreciated. - Two more btrfs specific bug fixes All exposed during manual tests - Remove the tailing unaligned block handling In fact, all fses can easily handle such case, just a min() call is enough. - Remove the support for sandboxfs Since it's using read() calls, really no need to do block alignment check. - Enhanced blocksize check Ensure the returned blocksize is not only non-error, but also non-zero. This patchset can be fetched from github: https://github.com/adam900710/u-boot/tree/fs_unaligned_read [BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses. Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size). But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case. [ADVANTAGE] This patchset will handle unaligned read range in _fs_read(): - Get blocksize of the underlying fs - Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer. If the first block covers the whole range, we just call it aday. - Read the remaining range which starts at block aligned offset For most common case, which is 0 offset and 0 len, the code will not be changed at all. Just one extra get_blocksize() call, and for FAT/Btrfs/EROFS they all have cached blocksize, thus it takes almost no extra cost. Although for EXT4, it doesn't seem to cache the blocksize globally, thus has to do a path resolve and grab the blocksize. [DISADVANTAGE] The involved problem is: - Extra path resolving All those supported fs may have to do one extra path resolve if the read offset is not aligned. For EXT4, it will do one extra path resolve just to grab the blocksize. For data read which starts at offset 0 (the most common case), it should cause *NO* difference in performance. As the extra handling is only for unaligned offset. The common path is not really modified. [SUPPORTED FSES] - Btrfs (manually tested*) - Ext4 (manually tested) - FAT (manually tested) - Erofs - ubifs (unable to test, due to compile failure) *: Failed to get the test cases run, thus have to go sandbox mode, and attach an image with target fs, load the target file (with unaligned range) and compare the result using md5sum. For EXT4/FAT, they may need extra cleanup, as their existing unaligned range handling is no longer needed anymore, cleaning them up should free more code lines than the added one. Just not confident enough to modify them all by myself. [UNSUPPORTED FSES] - Squashfs They don't support non-zero offset, thus it can not handle the block aligned range. Need extra help to add block aligned offset support. - Semihostfs - Sandboxfs They all use read() directly, no need to do alignment check at all. Extra testing/feedback is always appreciated. Qu Wenruo (8): fs: fat: unexport file_fat_read_at() fs: btrfs: fix a bug which no data get read if the length is not 0 fs: btrfs: fix a crash if specified range is beyond file size fs: btrfs: move the unaligned read code to _fs_read() for btrfs fs: ext4: rely on _fs_read() to handle leading unaligned block read fs: fat: rely on higher layer to get block aligned read range fs: ubifs: rely on higher layer to do unaligned read fs: erofs: add unaligned read range handling fs/btrfs/btrfs.c | 33 --- fs/btrfs/inode.c | 89 +++-- fs/erofs/internal.h | 1 + fs/erofs/super.c | 6 ++ fs/ext4/ext4fs.c | 22 +++ fs/fat/fat.c | 17 +- fs/fs.c | 130 +++--- fs/ubifs/ubifs.c | 13 +++-- include/btrfs.h | 1 + include/erofs.h | 1 + include/ext4fs.h | 1 + include/fat.h | 3 +- include/ubifs_uboot.h | 1 + 13 files changed, 212 insertions(+), 106 deletions(-) -- 2.36.1
Re: [PATCH 0/8] u-boot: fs: add generic unaligned read handling
On 2022/6/28 22:17, Tom Rini wrote: On Tue, Jun 28, 2022 at 03:28:00PM +0800, Qu Wenruo wrote: [BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses. Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size). But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case. [ADVANTAGE] This patchset will handle unaligned read range in _fs_read(): - Get blocksize of the underlying fs - Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer. If the first block covers the whole range, we just call it aday. - Read the aligned range if there is any - Read the tailing block containing the unaligned range And copy the covered range into the destination. [DISADVANTAGE] There are mainly two problems: - Extra memory allocation for every _fs_read() call For the leading and tailing block. - Extra path resolving All those supported fs will have to do extra path resolving up to 2 times (one for the leading block, one for the tailing block). This may slow down the read. This conceptually seems like a good thing. Can you please post some before/after times of reading large images from the supported filesystems? One thing to mention is, this change doesn't really bother large file read. As the patchset is splitting a large read into 3 parts: 1) Leading block 2) Aligned blocks, aka the main part of a large file 3) Tailing block Most time should still be spent on part 2), not much different than the old code. Part 1) and Part 3) are at most 2 blocks (aka, 2 * 4KiB for most modern large enough fses). So I doubt it would make any difference for large file read. Furthermore, as pointed out by Huang Jianan, currently the patchset can not handle read on soft link correctly, thus I'd update the series to do the split into even less parts: 1) Leading block For the unaligned initial block 2) Aligned blocks until the end The tailing block should still starts at a block aligned position, thus most filesystems is already handling them correctly. (Just a min(end, blockend) is enough for most cases already). Anyway, I'll try to craft some benchmarking for file reads using sandbox. But please don't expect much (or any) difference in that case. Thanks, Qu
Re: [PATCH RFC 2/8] fs: always get the file size in _fs_read()
On 2022/6/28 20:36, Huang Jianan wrote: Hi, wenruo, 在 2022/6/28 15:28, Qu Wenruo 写道: For _fs_read(), @len == 0 means we read the whole file. And we just pass @len == 0 to make each filesystem to handle it. In fact we have info->size() call to properly get the filesize. So we can not only call info->size() to grab the file_size for len == 0 case, but also detect invalid @len (e.g. @len > file_size) in advance or truncate @len. This behavior also allows us to handle unaligned better in the incoming patches. Signed-off-by: Qu Wenruo --- fs/fs.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/fs/fs.c b/fs/fs.c index 6de1a3eb6d5d..d992cdd6d650 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -579,6 +579,7 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, { struct fstype_info *info = fs_get_info(fs_type); void *buf; + loff_t file_size; int ret; #ifdef CONFIG_LMB @@ -589,10 +590,26 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, } #endif - /* - * We don't actually know how many bytes are being read, since len==0 - * means read the whole file. - */ + ret = info->size(filename, _size); I get an error when running the erofs test cases. The return value isn't as expected when reading symlink file. For symlink file, erofs_size will return the size of the symlink itself here. Indeed, this is a problem, in fact I also checked other fses, it looks like we all just return the inode size for the softlink, thus size() call can not be relied on for those cases. While for the read() calls, every fs will do extra resolving for soft links, thus it doesn't cause problems. This can still be solved by not calling size() calls at all, and only do the unaligned read handling for the leading block. Thank you very much for pointing this bug out, would update the patchset for that. Thanks, Qu + if (ret < 0) { + log_err("** Unable to get file size for %s, %d **\n", + filename, ret); + return ret; + } + if (offset >= file_size) { + log_err( + "** Invalid offset, offset (%llu) >= file size (%llu)\n", + offset, file_size); + return -EINVAL; + + } + if (len == 0 || offset + len > file_size) { + if (len > file_size) + log_info( + "** Truncate read length from %llu to %llu, as file size is %llu **\n", + len, file_size, file_size); + len = file_size - offset; Then, we will get a wrong len in the case of len==0. So I think we need to do something extra with the symlink file? Thanks, Jianan + } buf = map_sysmem(addr, len); ret = info->read(filename, buf, offset, len, actread); unmap_sysmem(buf);
[PATCH RFC 8/8] fs: erofs: add unaligned read range handling
I'm not an expert on erofs, but my quick glance didn't expose any special handling on unaligned range, thus I think the U-boot erofs driver doesn't really support unaligned read range. This patch will add erofs_get_blocksize() so erofs can benefit from the generic unaligned read support. Cc: Huang Jianan Cc: linux-er...@lists.ozlabs.org Signed-off-by: Qu Wenruo --- fs/erofs/internal.h | 1 + fs/erofs/super.c| 6 ++ fs/fs.c | 2 +- include/erofs.h | 1 + 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 4af7c91560cc..d368a6481bf1 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -83,6 +83,7 @@ struct erofs_sb_info { u16 available_compr_algs; u16 lz4_max_distance; u32 checksum; + u32 blocksize; u16 extra_devices; union { u16 devt_slotoff; /* used for mkfs */ diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 4cca322b9ead..df01d2e719a7 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -99,7 +99,13 @@ int erofs_read_superblock(void) sbi.build_time = le64_to_cpu(dsb->build_time); sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec); + sbi.blocksize = 1 << blkszbits; memcpy(, dsb->uuid, sizeof(dsb->uuid)); return erofs_init_devices(, dsb); } + +int erofs_get_blocksize(const char *filename) +{ + return sbi.blocksize; +} diff --git a/fs/fs.c b/fs/fs.c index 41943e3a95db..0ef5fbdda5d6 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -366,7 +366,7 @@ static struct fstype_info fstypes[] = { .readdir = erofs_readdir, .ls = fs_ls_generic, .read = erofs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = erofs_get_blocksize, .size = erofs_size, .close = erofs_close, .closedir = erofs_closedir, diff --git a/include/erofs.h b/include/erofs.h index 1fbe82bf72cb..18bd6807c538 100644 --- a/include/erofs.h +++ b/include/erofs.h @@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc, struct disk_partition *fs_partition); int erofs_read(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); +int erofs_get_blocksize(const char *filename); int erofs_size(const char *filename, loff_t *size); int erofs_exists(const char *filename); void erofs_close(void); -- 2.36.1
[PATCH RFC 7/8] fs: ubifs: rely on higher layer to do unaligned read
Currently ubifs doesn't support unaligned read offset, thanks to the recent _fs_read() work to handle unaligned read, we only need to implement ubifs_get_blocksize() to take advantage of it. Now ubifs can do unaligned read without any problem. Signed-off-by: Qu Wenruo --- fs/fs.c | 2 +- fs/ubifs/ubifs.c | 13 - include/ubifs_uboot.h | 1 + 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/fs.c b/fs/fs.c index 337d5711c28c..41943e3a95db 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -302,7 +302,7 @@ static struct fstype_info fstypes[] = { .exists = ubifs_exists, .size = ubifs_size, .read = ubifs_read, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ubifs_get_blocksize, .write = fs_write_unsupported, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c index d3026e310168..a8ab556dd376 100644 --- a/fs/ubifs/ubifs.c +++ b/fs/ubifs/ubifs.c @@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t offset, *actread = 0; - if (offset & (PAGE_SIZE - 1)) { - printf("ubifs: Error offset must be a multiple of %d\n", - PAGE_SIZE); - return -1; - } + /* Higher layer should ensure it always pass page aligned range. */ + assert(IS_ALIGNED(offset, PAGE_SIZE)); + assert(IS_ALIGNED(size, PAGE_SIZE)); c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY); /* ubifs_findfile will resolve symlinks, so we know that we get @@ -920,6 +918,11 @@ out: return err; } +int ubifs_get_blocksize(const char *filename) +{ + return PAGE_SIZE; +} + void ubifs_close(void) { } diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h index b025779d59ff..bcd21715314a 100644 --- a/include/ubifs_uboot.h +++ b/include/ubifs_uboot.h @@ -29,6 +29,7 @@ int ubifs_exists(const char *filename); int ubifs_size(const char *filename, loff_t *size); int ubifs_read(const char *filename, void *buf, loff_t offset, loff_t size, loff_t *actread); +int ubifs_get_blocksize(const char *filename); void ubifs_close(void); #endif /* __UBIFS_UBOOT_H__ */ -- 2.36.1
[PATCH RFC 6/8] fs: sandboxfs: add sandbox_fs_get_blocksize()
This is to make sandboxfs to report blocksize it supports for _fs_read() to handle unaligned read. Unlike all other fses, sandboxfs can handle unaligned read/write without any problem since it's calling read()/write(), which doesn't bother the blocksize at all. This change is mostly to make testing of _fs_read() much easier. Cc: Simon Glass Signed-off-by: Qu Wenruo --- arch/sandbox/cpu/os.c | 11 +++ fs/fs.c| 2 +- fs/sandbox/sandboxfs.c | 14 ++ include/os.h | 8 include/sandboxfs.h| 1 + 5 files changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/sandbox/cpu/os.c b/arch/sandbox/cpu/os.c index 5ea54179176c..6c29f29bdd9b 100644 --- a/arch/sandbox/cpu/os.c +++ b/arch/sandbox/cpu/os.c @@ -46,6 +46,17 @@ ssize_t os_read(int fd, void *buf, size_t count) return read(fd, buf, count); } +ssize_t os_get_blocksize(int fd) +{ + struct stat stat = {0}; + int ret; + + ret = fstat(fd, ); + if (ret < 0) + return -errno; + return stat.st_blksize; +} + ssize_t os_write(int fd, const void *buf, size_t count) { return write(fd, buf, count); diff --git a/fs/fs.c b/fs/fs.c index 7e4ead9b790b..337d5711c28c 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -261,7 +261,7 @@ static struct fstype_info fstypes[] = { .exists = sandbox_fs_exists, .size = sandbox_fs_size, .read = fs_read_sandbox, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = sandbox_fs_get_blocksize, .write = fs_write_sandbox, .uuid = fs_uuid_unsupported, .opendir = fs_opendir_unsupported, diff --git a/fs/sandbox/sandboxfs.c b/fs/sandbox/sandboxfs.c index 4ae41d5b4db1..130fee088621 100644 --- a/fs/sandbox/sandboxfs.c +++ b/fs/sandbox/sandboxfs.c @@ -55,6 +55,20 @@ int sandbox_fs_read_at(const char *filename, loff_t pos, void *buffer, return ret; } +int sandbox_fs_get_blocksize(const char *filename) +{ + int fd; + int ret; + + fd = os_open(filename, OS_O_RDONLY); + if (fd < 0) + return fd; + + ret = os_get_blocksize(fd); + os_close(fd); + return ret; +} + int sandbox_fs_write_at(const char *filename, loff_t pos, void *buffer, loff_t towrite, loff_t *actwrite) { diff --git a/include/os.h b/include/os.h index 10e198cf503e..a864d9ca39b2 100644 --- a/include/os.h +++ b/include/os.h @@ -26,6 +26,14 @@ struct sandbox_state; */ ssize_t os_read(int fd, void *buf, size_t count); +/** + * Get the optimial blocksize through stat() call. + * + * @fd:File descriptor as returned by os_open() + * Return: >=0 for the blocksize. <0 for error. + */ +ssize_t os_get_blocksize(int fd); + /** * Access to the OS write() system call * diff --git a/include/sandboxfs.h b/include/sandboxfs.h index 783dd5c88a73..6937068f7b82 100644 --- a/include/sandboxfs.h +++ b/include/sandboxfs.h @@ -32,6 +32,7 @@ void sandbox_fs_close(void); int sandbox_fs_ls(const char *dirname); int sandbox_fs_exists(const char *filename); int sandbox_fs_size(const char *filename, loff_t *size); +int sandbox_fs_get_blocksize(const char *filename); int fs_read_sandbox(const char *filename, void *buf, loff_t offset, loff_t len, loff_t *actread); int fs_write_sandbox(const char *filename, void *buf, loff_t offset, -- 2.36.1
[PATCH RFC 5/8] fs: fat: rely on higher layer to get block aligned read range
Just implement fat_get_blocksize() for fat, so that fat_read_file() always get a block aligned read range. Unfortunately I'm not experienced enough to cleanup the fat code, thus further cleanup is appreciated. Cc: Tom Rini Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 13 + fs/fs.c | 2 +- include/fat.h | 1 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index dcceccbcee0a..e13035e8e6d1 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, loff_t offset, loff_t len, return ret; } +int fat_get_blocksize(const char *filename) +{ + fsdata fsdata = {0}; + int ret; + + ret = get_fs_info(); + if (ret) + return ret; + + free(fsdata.fatbuf); + return fsdata.sect_size; +} + typedef struct { struct fs_dir_stream parent; struct fs_dirent dirent; diff --git a/fs/fs.c b/fs/fs.c index e69a0968bb6d..7e4ead9b790b 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -207,7 +207,7 @@ static struct fstype_info fstypes[] = { .exists = fat_exists, .size = fat_size, .read = fat_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = fat_get_blocksize, #if CONFIG_IS_ENABLED(FAT_WRITE) .write = file_fat_write, .unlink = fat_unlink, diff --git a/include/fat.h b/include/fat.h index a9756fb4cd1b..c03a2bebecef 100644 --- a/include/fat.h +++ b/include/fat.h @@ -201,6 +201,7 @@ int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); int file_fat_read(const char *filename, void *buffer, int maxsize); +int fat_get_blocksize(const char *filename); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.36.1
[PATCH RFC 4/8] fs: ext4: rely on _fs_read() to pass block aligned range into ext4fs_read_file()
Since _fs_read() is handling the unaligned read internally, ext4 driver only need to handle block aligned read. Unfortunately I'm not familiar with ext4 and its driver, thus not confident enough to cleanup all the unaligned read related code. So here we will have some dead code, and any help to clean them up is appreciated. Cc: Tom Rini Signed-off-by: Qu Wenruo --- fs/ext4/ext4fs.c | 11 +++ fs/fs.c | 2 +- include/ext4fs.h | 1 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c index 4c89152ce4ad..be2680994d8b 100644 --- a/fs/ext4/ext4fs.c +++ b/fs/ext4/ext4fs.c @@ -71,6 +71,10 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos, ext_cache_init(); + /* Higher layer has ensured to pass block aligned range here. */ + assert(IS_ALIGNED(pos, blocksize)); + assert(IS_ALIGNED(len, blocksize)); + /* Adjust len so it we can't read past the end of the file. */ if (len + pos > filesize) len = (filesize - pos); @@ -183,6 +187,13 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos, return 0; } +int ext4fs_get_blocksize(const char *filename) +{ + struct ext_filesystem *fs = get_fs(); + + return fs->blksz; +} + int ext4fs_ls(const char *dirname) { struct ext2fs_node *dirnode = NULL; diff --git a/fs/fs.c b/fs/fs.c index 30696ac6c1a3..e69a0968bb6d 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -236,7 +236,7 @@ static struct fstype_info fstypes[] = { .exists = ext4fs_exists, .size = ext4fs_size, .read = ext4_read_file, - .get_blocksize = fs_get_blocksize_unsupported, + .get_blocksize = ext4fs_get_blocksize, #ifdef CONFIG_CMD_EXT4_WRITE .write = ext4_write_file, .ln = ext4fs_create_link, diff --git a/include/ext4fs.h b/include/ext4fs.h index cb5d9cc0a5c0..cc40cfedd954 100644 --- a/include/ext4fs.h +++ b/include/ext4fs.h @@ -146,6 +146,7 @@ int ext4fs_create_link(const char *target, const char *fname); struct ext_filesystem *get_fs(void); int ext4fs_open(const char *filename, loff_t *len); int ext4fs_read(char *buf, loff_t offset, loff_t len, loff_t *actread); +int ext4fs_get_blocksize(const char *filename); int ext4fs_mount(unsigned part_length); void ext4fs_close(void); void ext4fs_reinit_global(void); -- 2.36.1
[PATCH RFC 3/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code to handle the unaligned read (aka, read range is not aligned to fs block size). This makes underlying fs code harder to implement, as unlike FUSE/kernel code, now they have to handle unaligned read all by themselves. This patch will change the behavior, starting from btrfs, by moving the unaligned read code into _fs_read(). The idea is pretty simple, if we have an unaligned read request, we handle it in the following steps: 1. Grab the blocksize of the fs 2. Read the leading unaligned range We will read the block that @offset is in, and copy the requested part into buf. The the block we read covers the whole range, we just call it a day. 3. Read the aligned part 4. Read the tailing unaligned range Pretty much the same as the leading unaligned range, just read the whole block where @offset + @len is, then copy the requested range in the buffer. This has been tested with a proper randomly populated btrfs file, then tried in sandbox mode with different aligned and unaligned range and compare the output with md5sum. Cc: Marek Behun Cc: linux-bt...@vger.kernel.org Signed-off-by: Qu Wenruo --- fs/btrfs/btrfs.c | 24 fs/btrfs/inode.c | 84 ++- fs/fs.c | 148 +-- include/btrfs.h | 1 + 4 files changed, 160 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 741c6e20f533..f9a67468d508 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -223,17 +223,27 @@ out: return ret; } +int btrfs_get_blocksize(const char *filename) +{ + struct btrfs_fs_info *fs_info = current_fs_info; + + return fs_info->sectorsize; +} + int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, loff_t *actread) { struct btrfs_fs_info *fs_info = current_fs_info; struct btrfs_root *root; - loff_t real_size = 0; u64 ino; u8 type; int ret; ASSERT(fs_info); + + /* Higher layer should always pass correct @len in. */ + ASSERT(len); + ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID, file, , , , 40); if (ret < 0) { @@ -246,18 +256,6 @@ int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len, return -EINVAL; } - if (!len) { - ret = btrfs_size(file, _size); - if (ret < 0) { - error("Failed to get inode size: %s", file); - return ret; - } - len = real_size; - } - - if (len > real_size - offset) - len = real_size - offset; - ret = btrfs_file_read(root, ino, offset, len, buf); if (ret < 0) { error("An error occurred while reading file %s", file); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d00b5153336d..5615143fab82 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -620,44 +620,6 @@ check_next: return 1; } -static int read_and_truncate_page(struct btrfs_path *path, - struct btrfs_file_extent_item *fi, - int start, int len, char *dest) -{ - struct extent_buffer *leaf = path->nodes[0]; - struct btrfs_fs_info *fs_info = leaf->fs_info; - u64 aligned_start = round_down(start, fs_info->sectorsize); - u8 extent_type; - char *buf; - int page_off = start - aligned_start; - int page_len = fs_info->sectorsize - page_off; - int ret; - - ASSERT(start + len <= aligned_start + fs_info->sectorsize); - buf = malloc_cache_aligned(fs_info->sectorsize); - if (!buf) - return -ENOMEM; - - extent_type = btrfs_file_extent_type(leaf, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - ret = btrfs_read_extent_inline(path, fi, buf); - memcpy(dest, buf + page_off, min(page_len, ret)); - free(buf); - return len; - } - - ret = btrfs_read_extent_reg(path, fi, - round_down(start, fs_info->sectorsize), - fs_info->sectorsize, buf); - if (ret < 0) { - free(buf); - return ret; - } - memcpy(dest, buf + page_off, page_len); - free(buf); - return len; -} - int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, char *dest) { @@ -676,31 +638,12 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, /* Set the whole dest all zero, so we won't need to bother holes */ memset(dest, 0, len); - /* Read out the leading unaligned part */ - if (aligned_start !=
[PATCH RFC 2/8] fs: always get the file size in _fs_read()
For _fs_read(), @len == 0 means we read the whole file. And we just pass @len == 0 to make each filesystem to handle it. In fact we have info->size() call to properly get the filesize. So we can not only call info->size() to grab the file_size for len == 0 case, but also detect invalid @len (e.g. @len > file_size) in advance or truncate @len. This behavior also allows us to handle unaligned better in the incoming patches. Signed-off-by: Qu Wenruo --- fs/fs.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/fs/fs.c b/fs/fs.c index 6de1a3eb6d5d..d992cdd6d650 100644 --- a/fs/fs.c +++ b/fs/fs.c @@ -579,6 +579,7 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, { struct fstype_info *info = fs_get_info(fs_type); void *buf; + loff_t file_size; int ret; #ifdef CONFIG_LMB @@ -589,10 +590,26 @@ static int _fs_read(const char *filename, ulong addr, loff_t offset, loff_t len, } #endif - /* -* We don't actually know how many bytes are being read, since len==0 -* means read the whole file. -*/ + ret = info->size(filename, _size); + if (ret < 0) { + log_err("** Unable to get file size for %s, %d **\n", + filename, ret); + return ret; + } + if (offset >= file_size) { + log_err( + "** Invalid offset, offset (%llu) >= file size (%llu)\n", + offset, file_size); + return -EINVAL; + + } + if (len == 0 || offset + len > file_size) { + if (len > file_size) + log_info( + "** Truncate read length from %llu to %llu, as file size is %llu **\n", +len, file_size, file_size); + len = file_size - offset; + } buf = map_sysmem(addr, len); ret = info->read(filename, buf, offset, len, actread); unmap_sysmem(buf); -- 2.36.1
[PATCH RFC 1/8] fs: fat: unexport file_fat_read_at()
That function is only utilized inside fat driver, unexport it. Signed-off-by: Qu Wenruo --- fs/fat/fat.c | 4 ++-- include/fat.h | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/fat/fat.c b/fs/fat/fat.c index df9ea2c028fc..dcceccbcee0a 100644 --- a/fs/fat/fat.c +++ b/fs/fat/fat.c @@ -1243,8 +1243,8 @@ out_free_itr: return ret; } -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread) +static int file_fat_read_at(const char *filename, loff_t pos, void *buffer, + loff_t maxsize, loff_t *actread) { fsdata fsdata; fat_itr *itr; diff --git a/include/fat.h b/include/fat.h index bd8e450b33a3..a9756fb4cd1b 100644 --- a/include/fat.h +++ b/include/fat.h @@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect) int file_fat_detectfs(void); int fat_exists(const char *filename); int fat_size(const char *filename, loff_t *size); -int file_fat_read_at(const char *filename, loff_t pos, void *buffer, -loff_t maxsize, loff_t *actread); int file_fat_read(const char *filename, void *buffer, int maxsize); int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info); int fat_register_device(struct blk_desc *dev_desc, int part_no); -- 2.36.1
[PATCH 0/8] u-boot: fs: add generic unaligned read handling
[BACKGROUND] Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code just pass the request range to underlying fses. Under most case, this works fine, as U-boot only really needs to read the whole file (aka, 0 for both offset and len, len will be later determined using file size). But if some advanced user/script wants to extract kernel/initramfs from combined image, we may need to do unaligned read in that case. [ADVANTAGE] This patchset will handle unaligned read range in _fs_read(): - Get blocksize of the underlying fs - Read the leading block contianing the unaligned range The full block will be stored in a local buffer, then only copy the bytes in the unaligned range into the destination buffer. If the first block covers the whole range, we just call it aday. - Read the aligned range if there is any - Read the tailing block containing the unaligned range And copy the covered range into the destination. [DISADVANTAGE] There are mainly two problems: - Extra memory allocation for every _fs_read() call For the leading and tailing block. - Extra path resolving All those supported fs will have to do extra path resolving up to 2 times (one for the leading block, one for the tailing block). This may slow down the read. [SUPPORTED FSES] - Btrfs (manually tested*) - Ext4 (manually tested) - FAT (manually tested) - Erofs - sandboxfs - ubifs *: Failed to get the test cases run, thus have to go sandbox mode, and attach an image with target fs, load the target file (with unaligned range) and compare the result using md5sum. For EXT4/FAT, they may need extra cleanup, as their existing unaligned range handling is no longer needed anymore, cleaning them up should free more code lines than the added one. Just not confident enough to modify them all by myself. [UNSUPPORTED FSES] - Squashfs They don't support non-zero offset, thus it can not handle the tailing block reading. Need extra help to add block aligned offset support. - Semihostfs It's using hardcoded trap to do system calls, not sure how it would work for stat() call. Extra testing/feedback is always appreciated. Qu Wenruo (8): fs: fat: unexport file_fat_read_at() fs: always get the file size in _fs_read() fs: btrfs: move the unaligned read code to _fs_read() for btrfs fs: ext4: rely on _fs_read() to pass block aligned range into ext4fs_read_file() fs: fat: rely on higher layer to get block aligned read range fs: sandboxfs: add sandbox_fs_get_blocksize() fs: ubifs: rely on higher layer to do unaligned read fs: erofs: add unaligned read range handling arch/sandbox/cpu/os.c | 11 +++ fs/btrfs/btrfs.c | 24 +++--- fs/btrfs/inode.c | 84 ++-- fs/erofs/internal.h| 1 + fs/erofs/super.c | 6 ++ fs/ext4/ext4fs.c | 11 +++ fs/fat/fat.c | 17 - fs/fs.c| 169 +++-- fs/sandbox/sandboxfs.c | 14 fs/ubifs/ubifs.c | 13 ++-- include/btrfs.h| 1 + include/erofs.h| 1 + include/ext4fs.h | 1 + include/fat.h | 3 +- include/os.h | 8 ++ include/sandboxfs.h| 1 + include/ubifs_uboot.h | 1 + 17 files changed, 258 insertions(+), 108 deletions(-) -- 2.36.1
Re: [PATCH 1/1] btrfs: simplify lookup_data_extent()
On 2022/5/11 06:48, Qu Wenruo wrote: On 2022/5/11 03:43, Heinrich Schuchardt wrote: After returning if ret <= 0 we know that ret > 0. No need to check it. Signed-off-by: Heinrich Schuchardt Reviewed-by: Qu Wenruo Just to mention for other guys in the btrfs list, this patch is for U-boot btrfs implementation. And I also checked btrfs-fuse project, which has a similar function, lookup_file_extent(), it already does the check properly and even do extra quick exit for (ret > 0 && path->slots[0] == 0) case. So you may want to also check btrfs-fuse project to find some possible optimization and cross-port to U-boot. (So far btrfs-fuse has better test coverage using fsstress, and cross-checked against kernel). Thanks, Qu Thanks, Qu --- fs/btrfs/inode.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d00b515333..0173d30cd8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -546,15 +546,12 @@ static int lookup_data_extent(struct btrfs_root *root, struct btrfs_path *path, /* Error or we're already at the file extent */ if (ret <= 0) return ret; - if (ret > 0) { - /* Check previous file extent */ - ret = btrfs_previous_item(root, path, ino, - BTRFS_EXTENT_DATA_KEY); - if (ret < 0) - return ret; - if (ret > 0) - goto check_next; - } + /* Check previous file extent */ + ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY); + if (ret < 0) + return ret; + if (ret > 0) + goto check_next; /* Now the key.offset must be smaller than @file_offset */ btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]); if (key.objectid != ino ||
Re: [PATCH 1/1] btrfs: simplify lookup_data_extent()
On 2022/5/11 03:43, Heinrich Schuchardt wrote: After returning if ret <= 0 we know that ret > 0. No need to check it. Signed-off-by: Heinrich Schuchardt Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/inode.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d00b515333..0173d30cd8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -546,15 +546,12 @@ static int lookup_data_extent(struct btrfs_root *root, struct btrfs_path *path, /* Error or we're already at the file extent */ if (ret <= 0) return ret; - if (ret > 0) { - /* Check previous file extent */ - ret = btrfs_previous_item(root, path, ino, - BTRFS_EXTENT_DATA_KEY); - if (ret < 0) - return ret; - if (ret > 0) - goto check_next; - } + /* Check previous file extent */ + ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY); + if (ret < 0) + return ret; + if (ret > 0) + goto check_next; /* Now the key.offset must be smaller than @file_offset */ btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]); if (key.objectid != ino ||
Handle unaligned read at fs layer?
Hi U-boot fs guys, With my previous rework on U-boot btrfs, and my own btrfs-fuse project, it turns out that, although U-boot implements its fs with a very fuse like interface, there are still some quality-of-life features missing. One of the most obvious one is making each fs to handle unaligned read. This is especially obvious when comparing btrfs-fuse btrfs_file_read() against U-boot btrfs_file_read(). They share almost the same main loop, but btrfs_file_read() in U-boot have two extra handling, for unaligned leading sector and unaligned tailing sector. So I just did a quick check on the other fses, it turns out that we have a very different behaviors: - FAT - Ext4 Have the handling for both tailing and ending block. - Btrfs My code, doing unaligned leading/tailing block manually. (the tailing part looks unnecessary though) - Ubifs Explicitly reject read with non-aligned offset. - Squshfs Explicitly reject read with non-zero offset. - Erofs Have the handling for tailing unaligned part, but doesn't seem to handle unaligned start part. Thus if we have a proper common routine for unaligned read routine (get the file length, read the first and the last unaligned block, then let the .read callback to handle the aligned range), then all fs drivers would have a much better life. Mind me to do such small but quality-of-life improve for all involved fses? (This will need a new call back to query the blocksize of each fs though) Thanks, Qu
Re: [PATCH] btrfs: Fix compilation on big endian systems
On 2022/4/7 20:51, Pali Rohár wrote: Fix following two compile errors on big endian systems: CC fs/btrfs/btrfs.o In file included from include/linux/byteorder/big_endian.h:107, from ./arch/powerpc/include/asm/byteorder.h:82, from ./arch/powerpc/include/asm/bitops.h:8, from include/linux/bitops.h:152, from include/uuid.h:9, from fs/btrfs/btrfs.c:10: fs/btrfs/conv-funcs.h: In function ‘btrfs_key_to_disk’: include/linux/byteorder/generic.h:90:21: error: ‘__cpu_to_le16’ undeclared (first use in this function); did you mean ‘__cpu_to_le16p’? #define cpu_to_le16 __cpu_to_le16 ^ fs/btrfs/conv-funcs.h:79:10: note: in expansion of macro ‘cpu_to_le16’ __u16: cpu_to_le16, \ ^~~ CC fs/btrfs/compression.o In file included from ./arch/powerpc/include/asm/unaligned.h:9, from fs/btrfs/compression.c:16: include/linux/unaligned/access_ok.h:6:19: error: redefinition of ‘get_unaligned_le16’ static inline u16 get_unaligned_le16(const void *p) ^~ In file included from fs/btrfs/ctree.h:16, from fs/btrfs/btrfs.h:12, from fs/btrfs/compression.c:8: include/linux/unaligned/le_byteshift.h:40:19: note: previous definition of ‘get_unaligned_le16’ was here static inline u16 get_unaligned_le16(const void *p) ^~ Include file asm/unaligned.h contains arch specific macros and functions for unaligned access as opposite to linux/unaligned le_byteshift.h which contains macros and functions specific to little endian systems only. Signed-off-by: Pali Rohár Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/btrfs.h | 2 +- fs/btrfs/ctree.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/btrfs.h b/fs/btrfs/btrfs.h index 7d8b395b2646..a52587e0637a 100644 --- a/fs/btrfs/btrfs.h +++ b/fs/btrfs/btrfs.h @@ -9,7 +9,7 @@ #define __BTRFS_BTRFS_H__ #include -#include "conv-funcs.h" +#include "ctree.h" extern struct btrfs_info btrfs_info; extern struct btrfs_fs_info *current_fs_info; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 219c410b189f..55112318a330 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include "kernel-shared/btrfs_tree.h" #include "crypto/hash.h"
Re: [PATCH] fs/btrfs: fix a bug that U-boot fs btrfs implementation doesn't handle NO_HOLE feature correctly
On 2021/12/27 14:11, Qu Wenruo wrote: [BUG] When passing a btrfs with NO_HOLE feature to U-boot, and if one file contains holes, then the hash of the file is not correct in U-boot: # mkfs.btrfs -f test.img # Since v5.15, mkfs defaults to NO_HOLES # mount test.img /mnt/btrfs # xfs_io -f -c "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/btrfs/file # md5sum /mnt/btrfs/file 277f3840b275c74d01e979ea9d75ac19 /mnt/btrfs/file # umount /mnt/btrfs # ./u-boot => host bind 0 /home/adam/test.img => ls host 0 < > 12288 Mon Dec 27 05:35:23 2021 file => load host 0 0x100 file 12288 bytes read in 0 ms => md5sum 0x100 0x3000 md5 for 0100 ... 01002fff ==> 855ffdbe4d0ccc5acab92e1b5330e4c1 The md5sum doesn't match at all. [CAUSE] In U-boot btrfs implementation, the function btrfs_read_file() has the following iteration for file extent iteration: /* Read the aligned part */ while (cur < aligned_end) { ret = lookup_data_extent(root, , ino, cur, _offset); if (ret < 0) goto out; if (ret > 0) { /* No next, direct exit */ if (!next_offset) { ret = 0; goto out; } } /* Read file extent */ But for NO_HOLES features, hole extents will not have any extent item for it. Thus if @cur is at a hole, lookup_data_extent() will just return >0, and update @next_offset. But we still believe there is some data to read for @cur for ret > 0 case, causing we read extent data from the next file extent. This means, what we do for above NO_HOLES btrfs is: - Read 4K data from disk to file offset [0, 4K) So far the data is still correct - Read 4K data from disk to file offset [4K, 8K) We didn't skip the 4K hole, but read the data at file offset [8K, 12K) into file offset [4K, 8K). This causes the checksum mismatch. [FIX] Add extra check to skip to the next non-hole range after lookup_data_extent(). Signed-off-by: Qu Wenruo --- This bug exposed another missing link, that we don't have good test coverage in U-boot btrfs. This is partially caused by the fact that, btrfs-progs code is not designed to read file contents, but just to check the cross-reference (aka, btrfs-check). If we really only want read-only support in U-boot, and don't ever plan to add write support, then I'd say the btrfs-fuse project (https://github.com/adam900710/btrfs-fuse/) is more suitable for U-boot. As that project already has full fs content verification selftest along with extra multi-device recovery tests. And shares the same code style between btrfs-progs/kernel. OK, things are not that bad. In fact, the btrfs_read_file() implementation in btrfs-fuse has the same naming, same lookup_file_extent() (just a little naming different than lookup_data_extent()), same parameter list. Just without the unaligned sector handling (handled by FUSE, and it may also be unnecessary for U-boot too), and already have the correct handling for lookup_file_extent(), thanks to the selftest. So this already means, it can be pretty easy for U-boot to take code from btrfs-fuse part by part, without huge refactor again. Thanks, Qu --- fs/btrfs/inode.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2c2379303d74..d00b5153336d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -717,6 +717,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, ret = 0; goto out; } + /* +* Find a extent gap, mostly caused by NO_HOLE feature. +* Just to next offset directly. +*/ + if (next_offset > cur) { + cur = next_offset; + continue; + } } fi = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_file_extent_item);
[PATCH 1/2] lib: add BLAKE2 hash support
The code is cross-ported from BLAKE2 reference implementation (https://github.com/BLAKE2/BLAKE2). With minimal change to remove unused macros/features. Currently there is only one user inside U-boot (btrfs), and since it only utilize BLAKE2B, all other favors are all removed. Signed-off-by: Qu Wenruo --- include/u-boot/blake2.h | 94 lib/Kconfig | 8 + lib/Makefile | 1 + lib/blake2/blake2-impl.h | 163 lib/blake2/blake2b.c | 311 +++ 5 files changed, 577 insertions(+) create mode 100644 include/u-boot/blake2.h create mode 100644 lib/blake2/blake2-impl.h create mode 100644 lib/blake2/blake2b.c diff --git a/include/u-boot/blake2.h b/include/u-boot/blake2.h new file mode 100644 index ..4984023a1a2d --- /dev/null +++ b/include/u-boot/blake2.h @@ -0,0 +1,94 @@ +/* SPDX-License-Identifier: CC0-1.0 */ +/* + BLAKE2 reference source code package - reference C implementations + + Copyright 2012, Samuel Neves . You may use this under the + terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at + your option. The terms of these licenses can be found at: + + - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0 + - OpenSSL license : https://www.openssl.org/source/license.html + - Apache 2.0: http://www.apache.org/licenses/LICENSE-2.0 + + More information about the BLAKE2 hash function can be found at + https://blake2.net. +*/ + +/* + * Cross-ported from BLAKE2 (https://github.com/BLAKE2/BLAKE2). + * Modifications includes: + * + * - Remove unsupported compilers like MSC/CPP + * - Remove blake2s/blake2sp/blake2bp + * This blake2 implementation is mostly for btrfs, which only utilizes + * blake2b. + */ +#ifndef BLAKE2_H +#define BLAKE2_H + +#include +#include + +#if defined(_MSC_VER) +#define BLAKE2_PACKED(x) __pragma(pack(push, 1)) x __pragma(pack(pop)) +#else +#define BLAKE2_PACKED(x) x __attribute__((packed)) +#endif + + enum blake2b_constant + { +BLAKE2B_BLOCKBYTES = 128, +BLAKE2B_OUTBYTES = 64, +BLAKE2B_KEYBYTES = 64, +BLAKE2B_SALTBYTES = 16, +BLAKE2B_PERSONALBYTES = 16 + }; + + typedef struct blake2b_state__ + { +uint64_t h[8]; +uint64_t t[2]; +uint64_t f[2]; +uint8_t buf[BLAKE2B_BLOCKBYTES]; +size_t buflen; +size_t outlen; +uint8_t last_node; + } blake2b_state; + + BLAKE2_PACKED(struct blake2b_param__ + { +uint8_t digest_length; /* 1 */ +uint8_t key_length;/* 2 */ +uint8_t fanout;/* 3 */ +uint8_t depth; /* 4 */ +uint32_t leaf_length; /* 8 */ +uint32_t node_offset; /* 12 */ +uint32_t xof_length;/* 16 */ +uint8_t node_depth;/* 17 */ +uint8_t inner_length; /* 18 */ +uint8_t reserved[14]; /* 32 */ +uint8_t salt[BLAKE2B_SALTBYTES]; /* 48 */ +uint8_t personal[BLAKE2B_PERSONALBYTES]; /* 64 */ + }); + + typedef struct blake2b_param__ blake2b_param; + + /* Padded structs result in a compile-time error */ + enum { +BLAKE2_DUMMY_2 = 1/(int)(sizeof(blake2b_param) == BLAKE2B_OUTBYTES) + }; + + /* Streaming API */ + int blake2b_init( blake2b_state *S, size_t outlen ); + int blake2b_init_key( blake2b_state *S, size_t outlen, const void *key, size_t keylen ); + int blake2b_init_param( blake2b_state *S, const blake2b_param *P ); + int blake2b_update( blake2b_state *S, const void *in, size_t inlen ); + int blake2b_final( blake2b_state *S, void *out, size_t outlen ); + + /* Simple API */ + int blake2b( void *out, size_t outlen, const void *in, size_t inlen, const void *key, size_t keylen ); + + /* This is simply an alias for blake2b */ + int blake2( void *out, size_t outlen, const void *in, size_t inlen, const void *key, size_t keylen ); + +#endif diff --git a/lib/Kconfig b/lib/Kconfig index 807a4c6ade06..e9976c7425e2 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -357,6 +357,14 @@ endmenu menu "Hashing Support" +config BLAKE2 + bool "Enable BLAKE2 support" + help + This option enables support of hashing using BLAKE2B algorithm. + The hash is calculated in software. + The BLAKE2 algorithm produces a hash value (digest) between 1 and + 64 bytes. + config SHA1 bool "Enable SHA1 support" help diff --git a/lib/Makefile b/lib/Makefile index 5ddbc77ed6d8..7950e847a9e2 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -63,6 +63,7 @@ obj-$(CONFIG_$(SPL_)MD5) += md5.o obj-$(CONFIG_ECDSA) += ecdsa/ obj-$(CONFIG_$(SPL_)RSA) += rsa/ obj-$(CONFIG_HASH) += hash-checksum.o +obj-$(CONFIG_BLAKE2) += blake2/blake2b.o obj-$(CONFIG_SHA1) += sha1.o obj-$(CONFIG_SHA256) += sha256.o obj-$(CONFIG_SHA512) += sha512.o diff --git a/lib/blake2/blake2-impl.h b/lib/blake2/blake2-impl.h new file mode 100644 index ..7ee52fdf5ada --- /dev/null +++ b/lib/blake2/blake2-impl.h @@ -
[PATCH 2/2] fs/btrfs: add dependency on BLAKE2 hash
Now btrfs can utilize the newly intorudced BLAKE2 hash. Signed-off-by: Qu Wenruo --- fs/btrfs/Kconfig | 1 + fs/btrfs/crypto/hash.c | 14 ++ fs/btrfs/crypto/hash.h | 1 + fs/btrfs/disk-io.c | 2 ++ 4 files changed, 18 insertions(+) diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index a0b48c23b31a..e31afe595f31 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -6,6 +6,7 @@ config FS_BTRFS select ZSTD select RBTREE select SHA256 + select BLAKE2 help This provides a single-device read-only BTRFS support. BTRFS is a next-generation Linux file system based on the copy-on-write diff --git a/fs/btrfs/crypto/hash.c b/fs/btrfs/crypto/hash.c index fb51f6386cb1..891a2974be05 100644 --- a/fs/btrfs/crypto/hash.c +++ b/fs/btrfs/crypto/hash.c @@ -4,6 +4,7 @@ #include #include #include +#include #include static u32 btrfs_crc32c_table[256]; @@ -39,6 +40,19 @@ int hash_xxhash(const u8 *buf, size_t length, u8 *out) return 0; } +/* We use the full CSUM_SIZE(32) for BLAKE2B */ +#define BTRFS_BLAKE2_HASH_SIZE 32 +int hash_blake2(const u8 *buf, size_t length, u8 *out) +{ + blake2b_state S; + + blake2b_init(, BTRFS_BLAKE2_HASH_SIZE); + blake2b_update(, buf, length); + blake2b_final(, out, BTRFS_BLAKE2_HASH_SIZE); + + return 0; +} + int hash_crc32c(const u8 *buf, size_t length, u8 *out) { u32 crc; diff --git a/fs/btrfs/crypto/hash.h b/fs/btrfs/crypto/hash.h index d1ba1fa374e3..f9846038bfa3 100644 --- a/fs/btrfs/crypto/hash.h +++ b/fs/btrfs/crypto/hash.h @@ -9,6 +9,7 @@ void btrfs_hash_init(void); int hash_crc32c(const u8 *buf, size_t length, u8 *out); int hash_xxhash(const u8 *buf, size_t length, u8 *out); int hash_sha256(const u8 *buf, size_t length, u8 *out); +int hash_blake2(const u8 *buf, size_t length, u8 *out); u32 crc32c(u32 seed, const void * data, size_t len); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index eb7736d53e11..8043abc1bd60 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -126,6 +126,8 @@ int btrfs_csum_data(u16 csum_type, const u8 *data, u8 *out, size_t len) return hash_xxhash(data, len, out); case BTRFS_CSUM_TYPE_SHA256: return hash_sha256(data, len, out); + case BTRFS_CSUM_TYPE_BLAKE2: + return hash_blake2(data, len, out); default: printf("Unknown csum type %d\n", csum_type); return -EINVAL; -- 2.34.1
[PATCH] fs/btrfs: fix a bug that U-boot fs btrfs implementation doesn't handle NO_HOLE feature correctly
[BUG] When passing a btrfs with NO_HOLE feature to U-boot, and if one file contains holes, then the hash of the file is not correct in U-boot: # mkfs.btrfs -f test.img # Since v5.15, mkfs defaults to NO_HOLES # mount test.img /mnt/btrfs # xfs_io -f -c "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/btrfs/file # md5sum /mnt/btrfs/file 277f3840b275c74d01e979ea9d75ac19 /mnt/btrfs/file # umount /mnt/btrfs # ./u-boot => host bind 0 /home/adam/test.img => ls host 0 < > 12288 Mon Dec 27 05:35:23 2021 file => load host 0 0x100 file 12288 bytes read in 0 ms => md5sum 0x100 0x3000 md5 for 0100 ... 01002fff ==> 855ffdbe4d0ccc5acab92e1b5330e4c1 The md5sum doesn't match at all. [CAUSE] In U-boot btrfs implementation, the function btrfs_read_file() has the following iteration for file extent iteration: /* Read the aligned part */ while (cur < aligned_end) { ret = lookup_data_extent(root, , ino, cur, _offset); if (ret < 0) goto out; if (ret > 0) { /* No next, direct exit */ if (!next_offset) { ret = 0; goto out; } } /* Read file extent */ But for NO_HOLES features, hole extents will not have any extent item for it. Thus if @cur is at a hole, lookup_data_extent() will just return >0, and update @next_offset. But we still believe there is some data to read for @cur for ret > 0 case, causing we read extent data from the next file extent. This means, what we do for above NO_HOLES btrfs is: - Read 4K data from disk to file offset [0, 4K) So far the data is still correct - Read 4K data from disk to file offset [4K, 8K) We didn't skip the 4K hole, but read the data at file offset [8K, 12K) into file offset [4K, 8K). This causes the checksum mismatch. [FIX] Add extra check to skip to the next non-hole range after lookup_data_extent(). Signed-off-by: Qu Wenruo --- This bug exposed another missing link, that we don't have good test coverage in U-boot btrfs. This is partially caused by the fact that, btrfs-progs code is not designed to read file contents, but just to check the cross-reference (aka, btrfs-check). If we really only want read-only support in U-boot, and don't ever plan to add write support, then I'd say the btrfs-fuse project (https://github.com/adam900710/btrfs-fuse/) is more suitable for U-boot. As that project already has full fs content verification selftest along with extra multi-device recovery tests. And shares the same code style between btrfs-progs/kernel. --- fs/btrfs/inode.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2c2379303d74..d00b5153336d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -717,6 +717,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len, ret = 0; goto out; } + /* +* Find a extent gap, mostly caused by NO_HOLE feature. +* Just to next offset directly. +*/ + if (next_offset > cur) { + cur = next_offset; + continue; + } } fi = btrfs_item_ptr(path.nodes[0], path.slots[0], struct btrfs_file_extent_item); -- 2.34.1
Re: U-Boot btrfs implementation - will there by syncing?
On 2021/12/17 07:45, Qu Wenruo wrote: On 2021/12/17 07:05, Marek Behún wrote: Hello Qu and others, one of the points of Qu's reimplementation of U-Boot's btrfs code that made it somehow synced with btrfs-progs, was that the old implementation was, quote pretty different from btrfs-progs nor kernel, making it pretty hard to sync code between different projects Another point was that afterwards Qu wanted to bring some of the missing btrfs features into U-Boot, since they should be much easier to implement after the reimplementation. I was looking at current btrfs-progs, and noticed that there were many changes since then, but only one or two were ever synced to U-Boot. Yep, those changes are mostly preparation for the incoming extent-tree v2 feature, they bring no real behavior change for now. And I'm planning to cross-port them just to make later cross-port easier. Forgot to mention that, a lot of btrfs-progs changes are not to the core implementation, but ioctls/user-interafaces/check/image tools, thus you may need to do more filtering to figure out what's really needed to be backported. But I may not cover all changes, so if you find some aspects missing, feel free to point it out, or even better, try to crossport if possible. Thanks for the reminder. I would like to know if Qu, or anyone else, is planning to do this code-syncing, and maybe implement some of the missing features? Some missing features, are blocked by lacking of interfaces and knoledge. Currently the missing hash (BLAKE2) needs BLAKE2 support in U-boot, which is still not yet in U-boot. If really needed, I may crossport BLAKE2B to Uboot then implement the support in btrfs. The similar case is for multi-device support, I'm not familiar enough on how to iterate through all block devices in Uboot. If there is already such interfaces or someone could provide some help, I'm pretty happy to implement it in Uboot. The reason I am asking this is that in the meantime I've reached the opinion that the idea of code-syncing in this sense almost never really works. It is usually proposed by one person who ports the code from another project, and then keeps it synced for a time, but then stops doing it because they have too much other work, or change employer, or another reason. Personally speaking, my current employer (SUSE) is super awesome, and there is no plan to move to another employer. Furthermore, SUSE is a super strong supporter for btrfs, thus maintaining a common base to provide better btrfs support is definitely benefiting SUSE. Thus you could count on such long term support, and personally speaking I'm pretty happy to work on btrfs, even not paid for. So the long term support, at least from myself, should be something you can rely on. Although there are indeed cases where I'm buried by other btrfs work, thus such reminder is really helpful. Thanks, Qu It happened multiple times. One example is U-Boot's Hush shell, which was imported from busybox, with the intention to keep it in sync, but the reality is that U-Boot's implementation is now years (decade?) old, and there were so many ad-hoc fixes done in U-Boot that currently the only way to sync is basically to change the whole code (which will also cause issues with existing user scripts, but maybe it should be done anyway). Another reason why I am writing this is that I think that with the amount of work Qu put into that reimplementation last year, he could have instead implemented some of the new features in the old implementation and spend a similar amount of time doing that, and the result would be those new features in U-Boot for over a year by now. I doubt that, the original interface is completely different than the interface we have in kernel/progs. That means, if one wants to cross port the recent extent-tree v2 changes, it will be a hell. So in the long run, it's still beneficial, even they get de-synced somehow. Also, I've come to the opinion that maybe two different and maintained implementations are a good thing for such a project as a filesystem, similar to how multiple implementations of C/C++ compiler are a good thing. For the last few months I was on-and-off thinking about whether it would make sense to revert to the original implementation and then implement some of the missing features. But it would not make sense if Qu, or anyone else, is planning to do that code syncing and bringing of new features from btrfs-progs, so that's why I am asking. The worst example for different implementations are another bootloader, GRUB. Its btrfs implementation has way less features, no sanity checks, and is super easy to craft an image to cause infinite loop for GRUB. And it even can not handle the new default mkfs features (NO_HOLES, which is there for long long time). Thus I even re-implemented a new license compatible project from scratch, btrfs-fuse, to provide a basis for later cross-port. (That only has read-only
Re: U-Boot btrfs implementation - will there by syncing?
On 2021/12/17 07:05, Marek Behún wrote: Hello Qu and others, one of the points of Qu's reimplementation of U-Boot's btrfs code that made it somehow synced with btrfs-progs, was that the old implementation was, quote pretty different from btrfs-progs nor kernel, making it pretty hard to sync code between different projects Another point was that afterwards Qu wanted to bring some of the missing btrfs features into U-Boot, since they should be much easier to implement after the reimplementation. I was looking at current btrfs-progs, and noticed that there were many changes since then, but only one or two were ever synced to U-Boot. Yep, those changes are mostly preparation for the incoming extent-tree v2 feature, they bring no real behavior change for now. And I'm planning to cross-port them just to make later cross-port easier. Thanks for the reminder. I would like to know if Qu, or anyone else, is planning to do this code-syncing, and maybe implement some of the missing features? The reason I am asking this is that in the meantime I've reached the opinion that the idea of code-syncing in this sense almost never really works. It is usually proposed by one person who ports the code from another project, and then keeps it synced for a time, but then stops doing it because they have too much other work, or change employer, or another reason. It happened multiple times. One example is U-Boot's Hush shell, which was imported from busybox, with the intention to keep it in sync, but the reality is that U-Boot's implementation is now years (decade?) old, and there were so many ad-hoc fixes done in U-Boot that currently the only way to sync is basically to change the whole code (which will also cause issues with existing user scripts, but maybe it should be done anyway). Another reason why I am writing this is that I think that with the amount of work Qu put into that reimplementation last year, he could have instead implemented some of the new features in the old implementation and spend a similar amount of time doing that, and the result would be those new features in U-Boot for over a year by now. I doubt that, the original interface is completely different than the interface we have in kernel/progs. That means, if one wants to cross port the recent extent-tree v2 changes, it will be a hell. So in the long run, it's still beneficial, even they get de-synced somehow. Also, I've come to the opinion that maybe two different and maintained implementations are a good thing for such a project as a filesystem, similar to how multiple implementations of C/C++ compiler are a good thing. For the last few months I was on-and-off thinking about whether it would make sense to revert to the original implementation and then implement some of the missing features. But it would not make sense if Qu, or anyone else, is planning to do that code syncing and bringing of new features from btrfs-progs, so that's why I am asking. The worst example for different implementations are another bootloader, GRUB. Its btrfs implementation has way less features, no sanity checks, and is super easy to craft an image to cause infinite loop for GRUB. And it even can not handle the new default mkfs features (NO_HOLES, which is there for long long time). Thus I even re-implemented a new license compatible project from scratch, btrfs-fuse, to provide a basis for later cross-port. (That only has read-only support, and is mostly MIT licesnsed, the interface is almost the same as progs/kernel). It may be possible for simpler filesystems, but for complex filesysmtes like btrfs, a completely different implementation with completely different interfaces is really not an ideal situation. I don't really want to do it again and again, using different (and sometimes very crappy) interfaces to do the same work. Thanks, Qu Please give your opinions, people. Thanks. Marek
Re: [PATCH] distro_bootcmd: change the default dtb search path to include default kernel dtbs directory
On 2021/9/17 22:25, Mark Kettenis wrote: Date: Fri, 17 Sep 2021 19:42:29 +0800 Content-Language: en-US On 2021/9/17 19:34, Mark Kettenis wrote: From: Qu Wenruo Date: Fri, 17 Sep 2021 19:02:35 +0800 When booting using U-boot -> systemd-boot (EFI payload) -> kernel on RK3399, systemd-boot by some bug can't execute its "devicetree" key correctly to load its proper dtb from files. In that case, it will use fallback dtb from U-boot, which can be out-of-date, and on RK3399, even the latest U-boot contains out-of-date dtb which can cause problems like invalid opp tables. And for systemd-boot, it doesn't provide any board specific dtb, but completely relies on the EFI environment provided by U-boot, thus if we can't find a good dtb, the fallback one will be used anyway. So this patch will workaround the problem by appending common linux dtbs directory to the existing "efi_dtb_prefixes" so that for systemd-boot it can use the existing fdt and boot properly. Why isn't the dtb installed in the standard location? Isn't standard location "/dtbs" other than "/dtb"? At least kernel puts its dtbs into "/dtbs" by default. Hmm, well, when I Alexander Graf introduced the code to load an updated DTB the intention was clearly to load it from the EFI System Partition (ESP). I doubt that's where "the kernel" sticks them by default. So I suspect the intention is that you copy the DTB to the /dtb directory on the ESP if needed. Clearly that isn't what you've done on your system. Well, for this U-boot > systemd-boot > kernel loading scheme, systemd-boot is responsible to load the fdt. But for now (the latest stable release), it doesn't support fdt loading at all, and only recently get the ability: https://github.com/systemd/systemd/pull/19417 So in theory, it doesn't need any fdt for systemd-boot. It's more like a workaround (and not really needed now). I suspect things have moved on a bit as the discussion in the systemd thread shows. One of the new developments is the EFI_DT_FIXUP_PROCOL feature that is clearly targeted at EFI bootloaders picking and loading the DT instead of U-Boot. Yep, I didn't notice the development in systemd part. In this light I would be somewhat reluctant adding more directories to efi_dtb_prefixes. Yeah, please discard the patch, it's really a workaround and not needed anymore. Thanks, Qu Slowing the boot process down because distro's can't agree where to put the files is a bit lame... Would it be better to make the search path configurable at config time? By that each distro should config their dtbs search path, which could further optimize the boot sequence by removing all other unnecessary prefixes. Not in my opinion. I consider it a bad thing if people configure U-Boot specifically for their distro. The whole idea of distroboot and EFI in U-Boot is to be able to boot just any OS with a generic U-Boot binary. Thanks, Qu Signed-off-by: Qu Wenruo --- include/config_distro_bootcmd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/config_distro_bootcmd.h b/include/config_distro_bootcmd.h index 2627c2a6a541..4ec87483eb65 100644 --- a/include/config_distro_bootcmd.h +++ b/include/config_distro_bootcmd.h @@ -151,7 +151,7 @@ "load ${devtype} ${devnum}:${distro_bootpart} " \ "${fdt_addr_r} ${prefix}${efi_fdtfile}\0" \ \ - "efi_dtb_prefixes=/ /dtb/ /dtb/current/\0"\ + "efi_dtb_prefixes=/ /dtb/ /dtb/current/ /dtbs/ /dtbs/current/\0" \ "scan_dev_for_efi=" \ "setenv efi_fdtfile ${fdtfile}; " \ BOOTENV_EFI_SET_FDTFILE_FALLBACK \ -- 2.33.0
Re: [systemd-devel] Systemd-boot not properly loading device tree, when loaded by U-boot (ARM64, tested on RK3399)
On 2021/9/17 19:45, Lennart Poettering wrote: On Fr, 17.09.21 19:25, Qu Wenruo (w...@suse.com) wrote: Hi, I'm recently testing booting my RK3399 boards with the following boot sequence: U-boot -> systemd-boot (EFI payload) -> kernel Which provides much more flex than plain extlinux conf from U-boot. (More choice, easier to write config, runtime kernel change). So far "kernel" and "initramfs" key work fine. But I notice that "devicetree" key is not working properly. The Uboot fdt search path doesn't include "/dtbs" which is used by my distro, and my entry config specify the device-tree file like this: titleManjaroARM boot from nvme linux/Image devicetree/dtbs/rockchip/rk3399-rockpro64.dtb initrd/initramfs-linux.img optionsconsole=ttyS2,150 root=/dev/arm_nvme/root rw loglevel=7 Thus if systemd-boot doesn't load the correct device-tree, kernel will use the default fdt passed from Uboot, which is already out-of-date and can cause problems for the upstream kernel I used. Unfortunately, with above config, after booting the kernel, the fdt is the fallback one from Uboot, not loading the proper one specified by systemd-boot config. The proof I went is checking the opp table. I have replaced the "/dtbs/rockchip/rk3399-rockpro64.dtb" with a custom dtb which uses op1 tables. But the kernel only sees a very out-of-dated fdt, which some opp is even invalid. How could I continue debugging the missing link? Like what systemd-boot needs to load the device-tree? Or U-boot EFI environment lacks certain facility to support systemd-boot? Did you see this: https://github.com/systemd/systemd/pull/19417 Confirmed this pull fixes the problem. I only need to wait for next release to get it from my distro. Awesome! Thanks, Qu (and maybe this: https://github.com/systemd/systemd/pull/20601) maybe that addresses your issues? Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] Systemd-boot not properly loading device tree, when loaded by U-boot (ARM64, tested on RK3399)
On 2021/9/17 19:45, Lennart Poettering wrote: On Fr, 17.09.21 19:25, Qu Wenruo (w...@suse.com) wrote: Hi, I'm recently testing booting my RK3399 boards with the following boot sequence: U-boot -> systemd-boot (EFI payload) -> kernel Which provides much more flex than plain extlinux conf from U-boot. (More choice, easier to write config, runtime kernel change). So far "kernel" and "initramfs" key work fine. But I notice that "devicetree" key is not working properly. The Uboot fdt search path doesn't include "/dtbs" which is used by my distro, and my entry config specify the device-tree file like this: titleManjaroARM boot from nvme linux/Image devicetree/dtbs/rockchip/rk3399-rockpro64.dtb initrd/initramfs-linux.img optionsconsole=ttyS2,150 root=/dev/arm_nvme/root rw loglevel=7 Thus if systemd-boot doesn't load the correct device-tree, kernel will use the default fdt passed from Uboot, which is already out-of-date and can cause problems for the upstream kernel I used. Unfortunately, with above config, after booting the kernel, the fdt is the fallback one from Uboot, not loading the proper one specified by systemd-boot config. The proof I went is checking the opp table. I have replaced the "/dtbs/rockchip/rk3399-rockpro64.dtb" with a custom dtb which uses op1 tables. But the kernel only sees a very out-of-dated fdt, which some opp is even invalid. How could I continue debugging the missing link? Like what systemd-boot needs to load the device-tree? Or U-boot EFI environment lacks certain facility to support systemd-boot? Did you see this: https://github.com/systemd/systemd/pull/19417 (and maybe this: https://github.com/systemd/systemd/pull/20601) Awesome! Let me try these PRs and report back! Thanks for the super-fast mention! Qu maybe that addresses your issues? Lennart -- Lennart Poettering, Berlin
Re: [PATCH] distro_bootcmd: change the default dtb search path to include default kernel dtbs directory
On 2021/9/17 19:34, Mark Kettenis wrote: From: Qu Wenruo Date: Fri, 17 Sep 2021 19:02:35 +0800 When booting using U-boot -> systemd-boot (EFI payload) -> kernel on RK3399, systemd-boot by some bug can't execute its "devicetree" key correctly to load its proper dtb from files. In that case, it will use fallback dtb from U-boot, which can be out-of-date, and on RK3399, even the latest U-boot contains out-of-date dtb which can cause problems like invalid opp tables. And for systemd-boot, it doesn't provide any board specific dtb, but completely relies on the EFI environment provided by U-boot, thus if we can't find a good dtb, the fallback one will be used anyway. So this patch will workaround the problem by appending common linux dtbs directory to the existing "efi_dtb_prefixes" so that for systemd-boot it can use the existing fdt and boot properly. Why isn't the dtb installed in the standard location? Isn't standard location "/dtbs" other than "/dtb"? At least kernel puts its dtbs into "/dtbs" by default. Slowing the boot process down because distro's can't agree where to put the files is a bit lame... Would it be better to make the search path configurable at config time? By that each distro should config their dtbs search path, which could further optimize the boot sequence by removing all other unnecessary prefixes. Thanks, Qu Signed-off-by: Qu Wenruo --- include/config_distro_bootcmd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/config_distro_bootcmd.h b/include/config_distro_bootcmd.h index 2627c2a6a541..4ec87483eb65 100644 --- a/include/config_distro_bootcmd.h +++ b/include/config_distro_bootcmd.h @@ -151,7 +151,7 @@ "load ${devtype} ${devnum}:${distro_bootpart} " \ "${fdt_addr_r} ${prefix}${efi_fdtfile}\0" \ \ - "efi_dtb_prefixes=/ /dtb/ /dtb/current/\0"\ + "efi_dtb_prefixes=/ /dtb/ /dtb/current/ /dtbs/ /dtbs/current/\0" \ "scan_dev_for_efi=" \ "setenv efi_fdtfile ${fdtfile}; " \ BOOTENV_EFI_SET_FDTFILE_FALLBACK \ -- 2.33.0
Systemd-boot not properly loading device tree, when loaded by U-boot (ARM64, tested on RK3399)
Hi, I'm recently testing booting my RK3399 boards with the following boot sequence: U-boot -> systemd-boot (EFI payload) -> kernel Which provides much more flex than plain extlinux conf from U-boot. (More choice, easier to write config, runtime kernel change). So far "kernel" and "initramfs" key work fine. But I notice that "devicetree" key is not working properly. The Uboot fdt search path doesn't include "/dtbs" which is used by my distro, and my entry config specify the device-tree file like this: titleManjaroARM boot from nvme linux/Image devicetree/dtbs/rockchip/rk3399-rockpro64.dtb initrd/initramfs-linux.img optionsconsole=ttyS2,150 root=/dev/arm_nvme/root rw loglevel=7 Thus if systemd-boot doesn't load the correct device-tree, kernel will use the default fdt passed from Uboot, which is already out-of-date and can cause problems for the upstream kernel I used. Unfortunately, with above config, after booting the kernel, the fdt is the fallback one from Uboot, not loading the proper one specified by systemd-boot config. The proof I went is checking the opp table. I have replaced the "/dtbs/rockchip/rk3399-rockpro64.dtb" with a custom dtb which uses op1 tables. But the kernel only sees a very out-of-dated fdt, which some opp is even invalid. How could I continue debugging the missing link? Like what systemd-boot needs to load the device-tree? Or U-boot EFI environment lacks certain facility to support systemd-boot? Thanks, Qu
[PATCH] distro_bootcmd: change the default dtb search path to include default kernel dtbs directory
When booting using U-boot -> systemd-boot (EFI payload) -> kernel on RK3399, systemd-boot by some bug can't execute its "devicetree" key correctly to load its proper dtb from files. In that case, it will use fallback dtb from U-boot, which can be out-of-date, and on RK3399, even the latest U-boot contains out-of-date dtb which can cause problems like invalid opp tables. And for systemd-boot, it doesn't provide any board specific dtb, but completely relies on the EFI environment provided by U-boot, thus if we can't find a good dtb, the fallback one will be used anyway. So this patch will workaround the problem by appending common linux dtbs directory to the existing "efi_dtb_prefixes" so that for systemd-boot it can use the existing fdt and boot properly. Signed-off-by: Qu Wenruo --- include/config_distro_bootcmd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/config_distro_bootcmd.h b/include/config_distro_bootcmd.h index 2627c2a6a541..4ec87483eb65 100644 --- a/include/config_distro_bootcmd.h +++ b/include/config_distro_bootcmd.h @@ -151,7 +151,7 @@ "load ${devtype} ${devnum}:${distro_bootpart} " \ "${fdt_addr_r} ${prefix}${efi_fdtfile}\0" \ \ - "efi_dtb_prefixes=/ /dtb/ /dtb/current/\0"\ + "efi_dtb_prefixes=/ /dtb/ /dtb/current/ /dtbs/ /dtbs/current/\0" \ "scan_dev_for_efi=" \ "setenv efi_fdtfile ${fdtfile}; " \ BOOTENV_EFI_SET_FDTFILE_FALLBACK \ -- 2.33.0
Way to change $scan_dev_for_boot search sequence?
Hi, Is there any way to change the $scan_dev_for_boot search sequence? Currently it always go extlinux first, then scripts, and finally EFI. Normally this is completely fine, as one system should only have one of these, thus search sequence makes no difference. But for certain distros, like Manjaro ARM, may migrate from current extlinux config to UEFI payload (like GRUB or systemd-boot, to provide a better way to choose from different boot configs). Thus current extlinux->scritps->EFI search priority makes it much harder to make EFI to be the default boot option. It looks like config_distro_bootcmd.h is here to define the sequence, and I see no config option to modify that. So I guess if we want to config the sequence, we have to modify that header, right? Thanks, Qu
Re: [PATCH] btrfs: Use default subvolume as filesystem root
On 2021/9/1 下午9:48, Matthias Brugger wrote: On 01/09/2021 13:36, Tom Rini wrote: On Wed, Sep 01, 2021 at 01:28:30PM +0200, Matthias Brugger wrote: Hi Tom, On 02/08/2021 01:06, Qu Wenruo wrote: On 2021/8/2 上午4:52, Matwey V. Kornilov wrote: BTRFS volume consists of a number of subvolumes which can be mounted separately from each other. The top-level subvolume always exists even if no subvolumes were created manually. A subvolume can be denoted as the default subvolume i.e. the subvolume which is mounted by default. The default "default subvolume" is the top-level one, but this is far from the common practices used in the wild. For instance, openSUSE provides an OS snapshot/rollback feature based on BTRFS. To achieve this, the actual OS root filesystem is located into a separate subvolume which is "default" but not "top-level". That means that the /boot/dtb/ directory is also located inside this default subvolume instead of top-level one. However, the existing btrfs u-boot driver always uses the top-level subvolume as the filesystem root. This behaviour 1) is inconsistent with mount /dev/sda1 /target command, which mount the default subvolume 2) leads to the issues when /boot/dtb cannot be found properly (see the reference). I also noticed the problem in the past, but forgot to fix it This patch uses the default subvolume as the filesystem root to overcome mentioned issues. Reference: https://bugzilla.suse.com/show_bug.cgi?id=1185656 Signed-off-by: Matwey V. Kornilov Reviewed-by: Qu Wenruo I can see that this patch is marked in your patchwork queue as "Need Review / ACK". Qu is one of our core btrfs developer who reviewed the patch. Apart from that we have it running on openSUSE on top of v2021.07 for some time without any issues. Ah, yup. Qu is one of the people I do look for to have reviewed a btrfs patch before I apply it (and I throw things under Need Review / ACK as a note-to-self to make sure a patch does have one, when I can expect one, before applying, FWIW). Would it be possible to merge this for v2021.10 or do you see any blocker here? I think I had mentally filed it was feature not bugfix and was going to hold off, but since you're asking, yes, I can grab it for this release. Thanks! It's a bugfix, as loading files from BTRFS, at least in openSUSE does not work without it. Just to be extra accurate, the original code always accesses subvol 5 as the entrance point, thus for openSUSE or any other distros setting other default subvolume, this will cause behavior difference between kernel and u-boot, thus unable to find the correct /boot path. This problem is cause by my rework using btrfs-progs code. All my fault. Btrfs-progs never needs to bother the default subvolume, as it has no way to "mount" the fs nor explore the directories/files structure. Thus it's a regression, and I'm not sure if we should use Fixes: tag. Just in case, the fixes tag is: Fixes: f06bfcf54d0e ("fs: btrfs: Crossport open_ctree_fs_info() from btrfs-progs") Thank all guys involved in this case. Qu Thanks for the quick answer! Matthias
Re: [PATCH 04/11] btrfs: Suppress the message about missing filesystem
On 2021/8/19 上午11:40, Simon Glass wrote: This message comes up a lot when scanning filesystems. It suggests to the user that there is some sort of error, but in fact there is no reason to expect that a particular partition has a btrfs filesystem. Other filesystems don't print this error. Turn it into a debug message. Signed-off-by: Simon Glass Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/disk-io.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 349411c3ccd..1b69346e231 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -886,7 +886,11 @@ static int btrfs_scan_fs_devices(struct blk_desc *desc, ret = btrfs_scan_one_device(desc, part, fs_devices, _devs); if (ret) { - fprintf(stderr, "No valid Btrfs found\n"); + /* +* Avoid showing this when probing for a possible Btrfs +* +* fprintf(stderr, "No valid Btrfs found\n"); +*/ return ret; } return 0; @@ -975,7 +979,7 @@ struct btrfs_fs_info *open_ctree_fs_info(struct blk_desc *desc, disk_super = fs_info->super_copy; ret = btrfs_read_dev_super(desc, part, disk_super); if (ret) { - printk("No valid btrfs found\n"); + debug("No valid btrfs found\n"); goto out_devices; }
Re: [PATCH] btrfs: Use default subvolume as filesystem root
On 2021/8/2 上午4:52, Matwey V. Kornilov wrote: BTRFS volume consists of a number of subvolumes which can be mounted separately from each other. The top-level subvolume always exists even if no subvolumes were created manually. A subvolume can be denoted as the default subvolume i.e. the subvolume which is mounted by default. The default "default subvolume" is the top-level one, but this is far from the common practices used in the wild. For instance, openSUSE provides an OS snapshot/rollback feature based on BTRFS. To achieve this, the actual OS root filesystem is located into a separate subvolume which is "default" but not "top-level". That means that the /boot/dtb/ directory is also located inside this default subvolume instead of top-level one. However, the existing btrfs u-boot driver always uses the top-level subvolume as the filesystem root. This behaviour 1) is inconsistent with mount /dev/sda1 /target command, which mount the default subvolume 2) leads to the issues when /boot/dtb cannot be found properly (see the reference). I also noticed the problem in the past, but forgot to fix it This patch uses the default subvolume as the filesystem root to overcome mentioned issues. Reference: https://bugzilla.suse.com/show_bug.cgi?id=1185656 Signed-off-by: Matwey V. Kornilov Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/disk-io.c | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 349411c3cc..12f9579fcf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -804,6 +804,30 @@ static int setup_root_or_create_block(struct btrfs_fs_info *fs_info, return 0; } +static int get_default_subvolume(struct btrfs_fs_info *fs_info, +struct btrfs_key *key_ret) +{ + struct btrfs_root *root = fs_info->tree_root; + struct btrfs_dir_item *dir_item; + struct btrfs_path path; + int ret = 0; + + btrfs_init_path(); + + dir_item = btrfs_lookup_dir_item(NULL, root, , +BTRFS_ROOT_TREE_DIR_OBJECTID, +"default", 7, 0); + if (IS_ERR(dir_item)) { + ret = PTR_ERR(dir_item); + goto out; + } + + btrfs_dir_item_key_to_cpu(path.nodes[0], dir_item, key_ret); +out: + btrfs_release_path(); + return ret; +} + int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info) { struct btrfs_super_block *sb = fs_info->super_copy; @@ -833,9 +857,17 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info) fs_info->last_trans_committed = generation; - key.objectid = BTRFS_FS_TREE_OBJECTID; - key.type = BTRFS_ROOT_ITEM_KEY; - key.offset = (u64)-1; + ret = get_default_subvolume(fs_info, ); + if (ret) { + /* +* The default dir item isn't there. Linux kernel behaviour is +* to silently use the top-level subvolume in this case. +*/ + key.objectid = BTRFS_FS_TREE_OBJECTID; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = (u64)-1; + } + fs_info->fs_root = btrfs_read_fs_root(fs_info, ); if (IS_ERR(fs_info->fs_root))
Re: [PATCH 04/49] btrfs: Use U-Boot API for decompression
On 2021/5/4 上午7:10, Simon Glass wrote: Use the common function to avoid code duplication. Signed-off-by: Simon Glass Acked-by: Qu Wenruo Thanks, Qu --- (no changes since v1) fs/btrfs/compression.c | 51 +- 1 file changed, 5 insertions(+), 46 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 23efefa1997..7adfbb04a7c 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -6,6 +6,7 @@ */ #include "btrfs.h" +#include #include #include #include @@ -136,54 +137,12 @@ static u32 decompress_zlib(const u8 *_cbuf, u32 clen, u8 *dbuf, u32 dlen) static u32 decompress_zstd(const u8 *cbuf, u32 clen, u8 *dbuf, u32 dlen) { - ZSTD_DStream *dstream; - ZSTD_inBuffer in_buf; - ZSTD_outBuffer out_buf; - void *workspace; - size_t wsize; - u32 res = -1; - - wsize = ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT); - workspace = malloc(wsize); - if (!workspace) { - debug("%s: cannot allocate workspace of size %zu\n", __func__, - wsize); - return -1; - } - - dstream = ZSTD_initDStream(ZSTD_BTRFS_MAX_INPUT, workspace, wsize); - if (!dstream) { - printf("%s: ZSTD_initDStream failed\n", __func__); - goto err_free; - } + struct abuf in, out; - in_buf.src = cbuf; - in_buf.pos = 0; - in_buf.size = clen; + abuf_init_set(, (u8 *)cbuf, clen); + abuf_init_set(, dbuf, dlen); - out_buf.dst = dbuf; - out_buf.pos = 0; - out_buf.size = dlen; - - while (1) { - size_t ret; - - ret = ZSTD_decompressStream(dstream, _buf, _buf); - if (ZSTD_isError(ret)) { - printf("%s: ZSTD_decompressStream error %d\n", __func__, - ZSTD_getErrorCode(ret)); - goto err_free; - } - - if (in_buf.pos >= clen || !ret) - break; - } - - res = out_buf.pos; - -err_free: - free(workspace); - return res; + return zstd_decompress(, ); } u32 btrfs_decompress(u8 type, const char *c, u32 clen, char *d, u32 dlen)
[PATCH U-boot v2] fs: btrfs: fix the false alert of decompression failure
There are some cases where decompressed sectors can have padding zeros. In kernel code, we have lines to address such situation: /* * btrfs_getblock is doing a zero on the tail of the page too, * but this will cover anything missing from the decompressed * data. */ if (bytes < destlen) memset(kaddr+bytes, 0, destlen-bytes); kunmap_local(kaddr); But not in U-boot code, thus we have some reports of U-boot failed to read compressed files in btrfs. Fix it by doing the same thing of the kernel, for both inline and regular compressed extents. Reported-by: Matwey Kornilov Link: https://bugzilla.suse.com/show_bug.cgi?id=1183717 Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and btrfs_read_extent_reg()") Signed-off-by: Qu Wenruo --- Changelog: v2: - Fix the bug for regular and inline compressed extents --- fs/btrfs/inode.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 019d532a1a4b..2c2379303d74 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -390,10 +390,16 @@ int btrfs_read_extent_inline(struct btrfs_path *path, csize); ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi), cbuf, csize, dbuf, dsize); - if (ret < 0 || ret != dsize) { + if (ret == (u32)-1) { ret = -EIO; goto out; } + /* +* The compressed part ends before sector boundary, the remaining needs +* to be zeroed out. +*/ + if (ret < dsize) + memset(dbuf + ret, 0, dsize - ret); memcpy(dest, dbuf, dsize); ret = dsize; out: @@ -494,10 +500,16 @@ int btrfs_read_extent_reg(struct btrfs_path *path, ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi), cbuf, csize, dbuf, dsize); - if (ret != dsize) { + if (ret == (u32)-1) { ret = -EIO; goto out; } + /* +* The compressed part ends before sector boundary, the remaining needs +* to be zeroed out. +*/ + if (ret < dsize) + memset(dbuf + ret, 0, dsize - ret); /* Then copy the needed part */ memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len); ret = len; -- 2.31.1
[PATCH UBoot] fs: btrfs: fix the false alert of decompression failure
There are some cases where decompressed sectors can have padding zeros. In kernel code, we have lines to address such situation: /* * btrfs_getblock is doing a zero on the tail of the page too, * but this will cover anything missing from the decompressed * data. */ if (bytes < destlen) memset(kaddr+bytes, 0, destlen-bytes); kunmap_local(kaddr); But not in U-boot code, thus we have some reports of U-boot failed to read compressed files in btrfs. Fix it by doing the same thing of the kernel. Reported-by: Matwey Kornilov Link: https://bugzilla.suse.com/show_bug.cgi?id=1183717 Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and btrfs_read_extent_reg()") Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 019d532a1a4b..f780c53d5250 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -390,10 +390,16 @@ int btrfs_read_extent_inline(struct btrfs_path *path, csize); ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi), cbuf, csize, dbuf, dsize); - if (ret < 0 || ret != dsize) { + if (ret == (u32)-1) { ret = -EIO; goto out; } + /* +* The compressed part ends before sector boundary, the remaining needs +* to be zeroed out. +*/ + if (ret < dsize) + memset(dbuf + ret, 0, dsize - ret); memcpy(dest, dbuf, dsize); ret = dsize; out: -- 2.31.1
Re: [PATCH u-boot] fs: btrfs: do not fail when offset of a ROOT_ITEM is not -1
On 2021/2/11 上午12:21, Marek Behun wrote: On Wed, 10 Feb 2021 09:20:11 +0800 Qu Wenruo wrote: You're correct, the kernel is using new schema, btrfs_get_fs_root(), which only requires root objectid and completely get rid of the offset/type, which is far less possible to call with wrong parameters. It would be a good timing to sync the code between kernel and progs/u-boot now. So do you agree with this patch? If so, can you add Reviewed-by? Thanks. I mean, to change btrfs-progs interface to follow kernel btrfs_get_fs_root() schema, just pass objectid, without the need for btrfs_key. Thanks, Qu
Re: [PATCH u-boot] fs: btrfs: do not fail when offset of a ROOT_ITEM is not -1
On 2021/2/10 上午9:05, Marek Behun wrote: On Wed, 10 Feb 2021 08:09:14 +0800 Qu Wenruo wrote: On 2021/2/10 上午1:33, Marek Behún wrote: When the btrfs_read_fs_root() function is searching a ROOT_ITEM with location key offset other than -1, it currently fails via BUG_ON. The offset can have other value than -1, though. This can happen for example if a subvolume is renamed: $ btrfs subvolume create X && sync Create subvolume './X' $ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: X$ location key (270 ROOT_ITEM 18446744073709551615) type DIR transid 283 data_len 0 name_len 1 name: X $ mv X Y && sync $ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: Y$ location key (270 ROOT_ITEM 0) type DIR transid 285 data_len 0 name_len 1 name: Y As can be seen the offset changed from -1ULL to 0. Offset for subvolume ROOT_ITEM can be other values, especially for snapshot that offset is the transid when it get created. But the problem is, if we call btrfs_read_fs_root() for subvolume tree, the offset of the key really doesn't matter, the only important thing is the objectid. Thus we use that BUG_ON() to catch careless callers. Would you please provide a case where we wrongly call btrfs_read_fs_root() with incorrect offset inside btrfs-progs/uboot? I believe that would be the proper way to fix. Qu, this can be triggered in U-Boot when listing a directory containing a subvolume that was renamed: - create a subvolume && sync - rename subvolume && sync - umount, reboot, list the directory containing the subvolume in u-boot It will also break when you want to read a file that has a subvolume in it's path (e.g. `read mmc 0 0x1000 /renamed-subvol/file`). I found out this btrfs-progs commit: https://github.com/kdave/btrfs-progs/commit/10f1af0fe7de5a0310657993c7c21a1d78087e56 This commit ensures that while searching a directory recursively, when a ROOT_ITEM is encountered, the offset of its location is changed to -1 before passing the location to btrfs_read_fs_root(). That's what I expect the code to do, but you're right, if kernel is not doing it anymore, I prefer the kernel behavior. So maybe we could do this in u-boot as well, but why do this? Linux' btrfs driver does not check whether the offset is -1. So why do it here? You're correct, the kernel is using new schema, btrfs_get_fs_root(), which only requires root objectid and completely get rid of the offset/type, which is far less possible to call with wrong parameters. It would be a good timing to sync the code between kernel and progs/u-boot now. BTW, Qu, I think we have to change the BUG_ON code in U-Boot's btrfs driver. BUG_ON in U-Boot calls a complete SOC reset. We can't break whole U-Boot simply because btrfs partition contains broken data. U-Boot commands must fail in such a case, not reset the SOC. Well, progs (and even kernel) is a mine-field for BUG_ON()s. But at least for kernel, it's protected by tree-checker which rejects invalid on-disk data before it reaches btrfs code, thus mostly kernel BUG_ON()s are really hard to hit (a lot of them are even impossible to hit after the introduction of tree-checker), and indicate real problems. For now, the BUG_ON()s in U-boot still indicates problems that we can't really solve or doesn't expect at all in btrfs realm, e.g. the BUG_ON() you're hitting (call sites problem). I admit it's a pain in the ass for full SoC reset, but I don't have any better alternatives yet. The mid to long term solution would be introducing tree-checker to U-boot, so that the remaining BUG_ON()s are really code bugs. Thanks, Qu Marek
Re: [PATCH u-boot 1/2] fs: btrfs: skip xattrs in directory listing
On 2021/2/10 上午2:05, Marek Behún wrote: Skip xattrs in directory listing. U-Boot filesystem drivers do not list xattrs. Signed-off-by: Marek Behún Cc: David Sterba Cc: Qu Wenruo Cc: Tom Rini Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/btrfs.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 346b2c4341..6b4c5feb53 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -29,7 +29,6 @@ static int show_dir(struct btrfs_root *root, struct extent_buffer *eb, [BTRFS_FT_FIFO] = "FIFO", [BTRFS_FT_SOCK] = "SOCK", [BTRFS_FT_SYMLINK] = "SYMLINK", - [BTRFS_FT_XATTR]= "XATTR" }; u8 type = btrfs_dir_type(eb, di); char namebuf[BTRFS_NAME_LEN]; @@ -38,6 +37,10 @@ static int show_dir(struct btrfs_root *root, struct extent_buffer *eb, time_t mtime; int ret = 0; + /* skip XATTRs in directory listing */ + if (type == BTRFS_FT_XATTR) + return 0; + btrfs_dir_item_key_to_cpu(eb, di, ); if (key.type == BTRFS_ROOT_ITEM_KEY) {
Re: [PATCH u-boot 2/2] fs: btrfs: change directory list output to be aligned as before
On 2021/2/10 上午2:05, Marek Behún wrote: Since commit 325dd1f642dd ("fs: btrfs: Use btrfs_iter_dir() to ...") when btrfs is listing a directory, the output is not aligned: 15 Wed Sep 09 13:20:03 2020 boot.scr -> @/boot/boot.scr 0 Tue Feb 02 12:42:09 2021 @ 108 Tue Feb 02 12:54:04 2021 1.info Return back to how it was displayed previously, i.e.: 15 Wed Sep 09 13:20:03 2020 boot.scr -> @/boot/boot.scr 0 Tue Feb 02 12:42:09 2021 @ < >108 Tue Feb 02 12:54:04 2021 1.info Instead of '', print '< >', as ext4 driver. If an unknown directory item type is encountered, we will print the type number left padded with spaces, enclosed by '?', instead of '<' and '>', i.e.: ? 30?. name Signed-off-by: Marek Behún Fixes: 325dd1f642dd ("fs: btrfs: Use btrfs_iter_dir() to replace ...") Cc: David Sterba Cc: Qu Wenruo Cc: Tom Rini Reviewed-by: Qu Wenruo --- fs/btrfs/btrfs.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c index 6b4c5feb53..52a243a659 100644 --- a/fs/btrfs/btrfs.c +++ b/fs/btrfs/btrfs.c @@ -22,13 +22,13 @@ static int show_dir(struct btrfs_root *root, struct extent_buffer *eb, struct btrfs_inode_item ii; struct btrfs_key key; static const char* dir_item_str[] = { - [BTRFS_FT_REG_FILE] = "FILE", + [BTRFS_FT_REG_FILE] = " ", [BTRFS_FT_DIR] = "DIR", - [BTRFS_FT_CHRDEV] = "CHRDEV", - [BTRFS_FT_BLKDEV] = "BLKDEV", - [BTRFS_FT_FIFO] = "FIFO", - [BTRFS_FT_SOCK] = "SOCK", - [BTRFS_FT_SYMLINK] = "SYMLINK", + [BTRFS_FT_CHRDEV] = "CHR", + [BTRFS_FT_BLKDEV] = "BLK", + [BTRFS_FT_FIFO] = "FIF", + [BTRFS_FT_SOCK] = "SCK", + [BTRFS_FT_SYMLINK] = "SYM", Since btrfs-progs also use similar output for its dump-tree, I guess it's also possible to use the similar 3 chars output, except the FILE. Thanks, Qu }; u8 type = btrfs_dir_type(eb, di); char namebuf[BTRFS_NAME_LEN]; @@ -93,7 +93,7 @@ static int show_dir(struct btrfs_root *root, struct extent_buffer *eb, if (type < ARRAY_SIZE(dir_item_str) && dir_item_str[type]) printf("<%s> ", dir_item_str[type]); else - printf("DIR_ITEM.%u", type); + printf("?%3u? ", type); if (type == BTRFS_FT_CHRDEV || type == BTRFS_FT_BLKDEV) { ASSERT(key.type == BTRFS_INODE_ITEM_KEY); printf("%4llu,%5llu ", btrfs_stack_inode_rdev() >> 20,
Re: [PATCH u-boot] fs: btrfs: do not fail when offset of a ROOT_ITEM is not -1
On 2021/2/10 上午1:33, Marek Behún wrote: When the btrfs_read_fs_root() function is searching a ROOT_ITEM with location key offset other than -1, it currently fails via BUG_ON. The offset can have other value than -1, though. This can happen for example if a subvolume is renamed: $ btrfs subvolume create X && sync Create subvolume './X' $ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: X$ location key (270 ROOT_ITEM 18446744073709551615) type DIR transid 283 data_len 0 name_len 1 name: X $ mv X Y && sync $ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: Y$ location key (270 ROOT_ITEM 0) type DIR transid 285 data_len 0 name_len 1 name: Y As can be seen the offset changed from -1ULL to 0. Offset for subvolume ROOT_ITEM can be other values, especially for snapshot that offset is the transid when it get created. But the problem is, if we call btrfs_read_fs_root() for subvolume tree, the offset of the key really doesn't matter, the only important thing is the objectid. Thus we use that BUG_ON() to catch careless callers. Would you please provide a case where we wrongly call btrfs_read_fs_root() with incorrect offset inside btrfs-progs/uboot? I believe that would be the proper way to fix. Thanks, Qu Do not fail in this case. Signed-off-by: Marek Behún Cc: David Sterba Cc: Qu Wenruo Cc: Tom Rini --- fs/btrfs/disk-io.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b332ecb796..c6fdec95c1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -732,8 +732,7 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_fs_info *fs_info, return fs_info->chunk_root; if (location->objectid == BTRFS_CSUM_TREE_OBJECTID) return fs_info->csum_root; - BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID || - location->offset != (u64)-1); + BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID); node = rb_search(_info->fs_root_tree, (void *), btrfs_fs_roots_compare_objectids, NULL);
Re: [PATCH] fs: btrfs: Select SHA256 in Kconfig
On 2021/1/27 下午8:01, David Sterba wrote: On Wed, Jan 27, 2021 at 10:42:30AM +0100, matthias@kernel.org wrote: From: Matthias Brugger Since commit 565a4147d17a ("fs: btrfs: Add more checksum algorithms") btrfs uses the sha256 checksum algorithm. But Kconfig lacks to select it. This leads to compilation errors: fs/built-in.o: In function `hash_sha256': fs/btrfs/crypto/hash.c:25: undefined reference to `sha256_starts' fs/btrfs/crypto/hash.c:26: undefined reference to `sha256_update' fs/btrfs/crypto/hash.c:27: undefined reference to `sha256_finish' Signed-off-by: Matthias Brugger So this is a fix for u-boot, got me confused and not for the first time as there's Kconfig and the same fs/btrfs/ directory structure. Well, sometimes too unified file structure/code base can also be a problem. Considering I'm also going to continue cross-porting more code to U-boot, any recommendation on this? Using different prefix? Thanks, Qu
Re: [PATCH] fs: btrfs: Select SHA256 in Kconfig
On 2021/1/27 下午5:42, matthias@kernel.org wrote: From: Matthias Brugger Since commit 565a4147d17a ("fs: btrfs: Add more checksum algorithms") btrfs uses the sha256 checksum algorithm. But Kconfig lacks to select it. This leads to compilation errors: fs/built-in.o: In function `hash_sha256': fs/btrfs/crypto/hash.c:25: undefined reference to `sha256_starts' fs/btrfs/crypto/hash.c:26: undefined reference to `sha256_update' fs/btrfs/crypto/hash.c:27: undefined reference to `sha256_finish' Signed-off-by: Matthias Brugger Oh, forgot to select sha256. Thanks for the fix. Reviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index f302b1fbef..2a32f42ad1 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -4,6 +4,7 @@ config FS_BTRFS select LZO select ZSTD select RBTREE + select SHA256 help This provides a single-device read-only BTRFS support. BTRFS is a next-generation Linux file system based on the copy-on-write
Re: [PATCH 1/1] fs: btrfs: simplify close_ctree_fs_info()
On 2020/12/25 下午8:45, Heinrich Schuchardt wrote: At the beginning of close_ctree_fs_info() the value 0 is assigned to err and never changed before testing it. Let's get rid of the superfluous variable. Fixes: f06bfcf54d0e ("fs: btrfs: Crossport open_ctree_fs_info() from btrfs-progs") This is because current btrfs implementation of U-boot only support RO mount. In btrfs-progs, since we support RW support, we will try to commit transaction and we can hit error during transaction commit. Thus we use @ret to record current error and @error to record the final return value. I don't believe U-boot would support btrfs read-write any time soon, thus the cleanup should be OK. Reviewed-by: Qu Wenruo Thanks, Qu Signed-off-by: Heinrich Schuchardt --- fs/btrfs/disk-io.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 01e7cee520..b332ecb796 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1030,7 +1030,6 @@ out: int close_ctree_fs_info(struct btrfs_fs_info *fs_info) { int ret; - int err = 0; free_fs_roots_tree(_info->fs_root_tree); @@ -1038,9 +1037,7 @@ int close_ctree_fs_info(struct btrfs_fs_info *fs_info) ret = btrfs_close_devices(fs_info->fs_devices); btrfs_cleanup_all_caches(fs_info); btrfs_free_fs_info(fs_info); - if (!err) - err = ret; - return err; + return ret; } int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid) -- 2.29.2
Re: [PATCH 4/4] fs: btrfs: initialize @ii in show_dir() to make coverity happy
On 2020/11/2 下午3:24, Marek Behun wrote: > On Mon, 2 Nov 2020 08:27:16 +0800 > Qu Wenruo wrote: > >> Thus I really tend to believe it's just a bug in coverity. >> All locations accessing @ii all have its key.type checked to ensure it get >> filled in the first place. > > If this is a bug in coverity, we should fix coverity, not add extra > code to U-Boot, even if it is just one instruction. That simply stinks > the same way like when systemd crashed when "debug" parameter was > present in /proc/cmdline, and they sent a patch to kernel which removed > the "debug" parameter, instead of fixing systemd. IMO. Yep, you're right. Please discard this patch. Thanks, Qu > > Marek >
Re: [PATCH 2/4] fs: btrfs: volumes: prevent overflow for multiplying
On 2020/11/2 上午8:20, Qu Wenruo wrote: > > > On 2020/11/2 上午7:02, Marek Behun wrote: >> On Sat, 31 Oct 2020 09:07:50 +0800 >> Qu Wenruo wrote: >> >>> In real world, this should not cause problem as we have device number >>> limit thus it won't go beyond 4G for a single stripe. >> >> So we can't run into this overflow in U-Boot because only one device is >> supported? But in Linux we can run into this issue? >> > In kernel, we have device number check, IIRC for default nodesize is > just several donzens of devices. > And since each stripe is only 64K fixed in btrfs, you can see it won't > go beyond 4G even in kernel. Sorry, the 64K stripe_len is only for RAID56, and for kernel, we always use u64 for stripe_len, thus we won't hit the problem. The problem is in btrfs-progs, where the code comes from, we use int, other than u64. Thanks for the hint for the root cause, would related values to be u64 for both btrfs-progs and u-boot to follow the kernel scheme. Thanks, Qu > > Thanks, > Qu > signature.asc Description: OpenPGP digital signature