Re: [PATCH v2] fs: btrfs: fix out of bounds write

2024-06-18 Thread Qu Wenruo




在 2024/6/19 07:11, Alex Shumsky 写道:

Fix btrfs_read/read_and_truncate_page write out of bounds of destination
buffer. Old behavior break bootstd malloc'd buffers of exact file size.
Previously this OOB write have not been noticed because distroboot usually
read files into huge static memory areas.

Signed-off-by: Alex Shumsky 
Fixes: e342718 ("fs: btrfs: Implement btrfs_file_read()")


Reviewed-by: Qu Wenruo 

Thanks,
Qu

---

Changes in v2:
- fix error path handling
- add Fixes tag
- use min3

  fs/btrfs/inode.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4691612eda..3998ffc2c8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -640,7 +640,11 @@ static int read_and_truncate_page(struct btrfs_path *path,
extent_type = btrfs_file_extent_type(leaf, fi);
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
ret = btrfs_read_extent_inline(path, fi, buf);
-   memcpy(dest, buf + page_off, min(page_len, ret));
+   if (ret < 0) {
+   free(buf);
+   return ret;
+   }
+   memcpy(dest, buf + page_off, min3(page_len, ret, len));
free(buf);
return len;
}
@@ -652,7 +656,7 @@ static int read_and_truncate_page(struct btrfs_path *path,
free(buf);
return ret;
}
-   memcpy(dest, buf + page_off, page_len);
+   memcpy(dest, buf + page_off, min(page_len, len));
free(buf);
return len;
  }


Re: [PATCH] fs: btrfs: fix out of bounds write

2024-06-17 Thread Qu Wenruo




在 2024/6/18 05:19, Alex Shumsky 写道:

Fix btrfs_read/read_and_truncate_page write out of bounds of destination
buffer. Old behavior break bootstd malloc'd buffers of exact file size.
Previously this OOB write have not been noticed because distroboot usually
read files into huge static memory areas.

Signed-off-by: Alex Shumsky 


Reviewed-by: Qu Wenruo 

Thanks,
Qu

---

  fs/btrfs/inode.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4691612eda..b51f578b49 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -640,7 +640,7 @@ static int read_and_truncate_page(struct btrfs_path *path,
extent_type = btrfs_file_extent_type(leaf, fi);
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
ret = btrfs_read_extent_inline(path, fi, buf);
-   memcpy(dest, buf + page_off, min(page_len, ret));
+   memcpy(dest, buf + page_off, min(min(page_len, ret), len));
free(buf);
return len;
}
@@ -652,7 +652,7 @@ static int read_and_truncate_page(struct btrfs_path *path,
free(buf);
return ret;
}
-   memcpy(dest, buf + page_off, page_len);
+   memcpy(dest, buf + page_off, min(page_len, len));
free(buf);
return len;
  }


Re: BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Qu Wenruo




在 2024/4/22 16:45, Qu Wenruo 写道:
[...]


I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here.  This print statement is before the decrement.

printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n",
eb->start, eb->refs);




Just a small advice, in fact you can go with sandbox mode, running
U-boot in userspace, and bind a host file as a device to test the
filesystem code.

At least that's what I did for most U-boot bugs.

Thanks,
Qu


Re: BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Qu Wenruo




在 2024/4/22 16:07, Sachi King 写道:

Hi,

I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
it's a bug and not a corruption issue.

A bit of history on the filesystem.  It is a fairly new filesystem as it was
being used to give me access to test a wayland application on a
Raspberry Pi.  The filesystem was about 3 days old when I hit the bug, and
I'm fairly certain it never had an unclean shutdown.  I have checked the
filesystem with "btrfs check" which has found no errors.  The filesystem
mounts on Linux and is functional.


# btrfs check --check-data-csum /dev/sda2
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking csums against data
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 5070573568 bytes used, no error found
total csum bytes: 4451620
total tree bytes: 370458624
total fs tree bytes: 353124352
total extent tree bytes: 10010624
btree space waste bytes: 62303284
file data blocks allocated: 6786519040
  referenced 6328619008


Since btrfs check reports no error, the fs must be valid.

But considering how new it is, it may be related to some new features
not properly implemented in Uboot.

Is it possible to provide the whole binary dump of the fs?




I've made an image of the filesystem so I could reproduce the bug in an
environment that doesn't require the physical SBC, and have reproduced
the issue using the head of the master branch with "qemu-x86_64_defconfig".

My testing qemu was produced with the following:

# make qemu-x86_64_defconfig
# cat << EOF >> .config
CONFIG_AUTOBOOT=y
CONFIG_BOOTDELAY=1
CONFIG_USE_BOOTCOMMAND=y
CONFIG_BOOTSTD_DEFAULTS=y
CONFIG_BOOTSTD_FULL=y
CONFIG_CMD_BOOTFLOW_FULL=y
CONFIG_BOOTCOMMAND="bootflow scan -lb"
CONFIG_ENV_IS_NOWHERE=y
CONFIG_LZ4=y
CONFIG_BZIP2=y
CONFIG_ZSTD=y
CONFIG_FS_BTRFS=y
CONFIG_CMD_BTRFS=y
CONFIG_GZIP=y
CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
EOF
# make -j24


bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:

/ {
 bootstd {
 compatible = "u-boot,boot-std";
 filename-prefixes = "/@boot/", "/boot/", "/";
 bootdev-order = "scsi";
 extlinux {
 compatible = "u-boot,extlinux";
 };
 };
};



The VM was run with

qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action reboot=shutdown 
-drive file=/mnt/dbg/u-boot-debug.img


The error message I recive on boot is

BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
BUG!
resetting ...



I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here.  This print statement is before the decrement.

printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", eb->start, 
eb->refs);


The last message before the crash reported eb->start to be "0", with 0 refs.

free_extent_buffer_internal: eb->start[0] eb->refs[0]


The extent at 0 struck me as odd, so I tried commenting out the freeing, by
removing the call to free_extent_buffer_final, and this resulted in bootflow
succeeding and showing me the boot menu, which suprised me.
I expected to see the bug reproduce itself, with refs being zero, but eb->start
pointing somewhere valid, but I instead got a valid address with refs at 2.

I'm assuming that the order free_extent_buffer_internal is called is
deterministic, so by counting the print outputs the line that prior held
the extent_buffer with a 0 start was replaced with:

free_extent_buffer_internal: eb->start[249c000] eb->refs[2]


Interestingly, as can be seen in the full logs with my included print
messages, 249c000 is being used just before this, with a ref count of
2.  249c000 does appear to reach a point where it should have been freed
in the past, before it gets used again as seen in both logs.

The failing boot log:

U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
Trying to boot from SPI
Jumping to 64-bit U-Boot: Note many features are missing


U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)

CPU:   QEMU Virtual CPU version 2.5+
DRAM:  128 MiB
Core:  13 devices, 13 uclasses, devicetree: separate
Loading Environment from nowhere... OK
Model: QEMU x86 (I440FX)
Net:   e1000: 00:00:00:00:00:00

Error: e1000#0 No valid MAC address found.
   eth_initialize() No ethernet found.


Hit any key to stop autoboot:  0
Scanning for bootflows in all bootdevs
Seq  Method   State   UclassPart  Name  Filename
---  ---  --        

scanning bus for devices...
Target spinup took 0 ms.
SATA link 1 timeout.
Target spinup took 0 ms.
SATA link 3 timeout.
SATA link 4 timeout.
SATA link 5 timeout.

Re: [PATCH] fs: btrfs: fix reading when length specified

2023-11-15 Thread Qu Wenruo




On 2023/11/12 01:49, Sam Edwards wrote:

The btrfs read function limits the read length to ensure that it
and the read offset do not together exceed the size of the file.
However, this size was only being queried if the read length was
passed a value of zero (meaning "whole file"), and the size is
defaulted to 0 otherwise. This means the clamp will just zero out
the length if one is specified, preventing reading of the file.

Fix this by checking the file size unconditionally, and unifying
the default length and clamping logic as a single range check instead.

This bug was discovered when trying to boot Linux with initrd= via
'bootefi' from a btrfs partition. The EFI stub entered an infinite
loop of zero-length reads while trying to read the initrd, and the
boot process stalled indefinitely.


I'm not sure why this happend for the EFI environment.

Doesn't the EFI runtime should also try to read the whole file? Or that
EFI environment has specified the length to read instead?



Signed-off-by: Sam Edwards 


Anyway, the fix looks good to me.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

---
  fs/btrfs/btrfs.c | 15 ++-
  1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 4cdbbbe3d0..1149a3b200 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -228,7 +228,7 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
  {
struct btrfs_fs_info *fs_info = current_fs_info;
struct btrfs_root *root;
-   loff_t real_size = 0;
+   loff_t real_size;
u64 ino;
u8 type;
int ret;
@@ -246,16 +246,13 @@ int btrfs_read(const char *file, void *buf, loff_t 
offset, loff_t len,
return -EINVAL;
}

-   if (!len) {
-   ret = btrfs_size(file, _size);
-   if (ret < 0) {
-   error("Failed to get inode size: %s", file);
-   return ret;
-   }
-   len = real_size;
+   ret = btrfs_size(file, _size);
+   if (ret < 0) {
+   error("Failed to get inode size: %s", file);
+   return ret;
}

-   if (len > real_size - offset)
+   if (!len || len > real_size - offset)
len = real_size - offset;

ret = btrfs_file_read(root, ino, offset, len, buf);


Re: [PATCH] btrfs: fix some error checking for btrfs_decompress()

2023-08-03 Thread Qu Wenruo




On 2023/8/3 18:29, Dan Carpenter wrote:

The btrfs_decompress() function mostly (u32)-1 on error but it can
also return -EPERM or other kernel error codes from zstd_decompress().
The "ret" variable is an int, so we could just check for negatives.

Signed-off-by: Dan Carpenter 


Reviewed-by: Qu Wenruo 

Thanks,
Qu

---
  fs/btrfs/inode.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 38e285bf94b0..4691612eda33 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -390,7 +390,7 @@ int btrfs_read_extent_inline(struct btrfs_path *path,
   csize);
ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi),
   cbuf, csize, dbuf, dsize);
-   if (ret == (u32)-1) {
+   if (ret < 0) {
ret = -EIO;
goto out;
}
@@ -500,7 +500,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,

ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi), cbuf,
   csize, dbuf, dsize);
-   if (ret == (u32)-1) {
+   if (ret < 0) {
ret = -EIO;
goto out;
}


Re: [PATCH] fs: btrfs: Prevent error pointer dereference in list_subvolums()

2023-08-01 Thread Qu Wenruo




On 2023/7/26 14:59, Dan Carpenter wrote:

If btrfs_read_fs_root() fails with -ENOENT, then we go to the next
entry.  Fine.  But if it fails for a different reason then we need
to clean up and return an error code.  In the current code it
doesn't clean up but instead dereferences "root" and crashes.

Signed-off-by: Dan Carpenter 


Reviewed-by: Qu Wenruo 


---
I didn't CC the btrfs mailing list.  Perhaps, I should have?


This patch is fine. The function is specific to U-boot, and not utilized 
by kernel/btrfs-progs.


Thanks,
Qu



  fs/btrfs/subvolume.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/subvolume.c b/fs/btrfs/subvolume.c
index d446e7a2c418..68ca7e48e48e 100644
--- a/fs/btrfs/subvolume.c
+++ b/fs/btrfs/subvolume.c
@@ -199,6 +199,7 @@ static int list_subvolums(struct btrfs_fs_info *fs_info)
ret = PTR_ERR(root);
if (ret == -ENOENT)
goto next;
+   goto out;
}
ret = list_one_subvol(root, result);
if (ret < 0)


Re: [PATCH] global: Use proper project name U-Boot

2023-05-22 Thread Qu Wenruo




On 2023/5/17 15:17, Michal Simek wrote:

Use proper project name in comments, Kconfig, readmes.

Signed-off-by: Michal Simek 

[...]

diff --git a/fs/btrfs/compat.h b/fs/btrfs/compat.h
index 9cf8a10c76c5..02173dea5f48 100644
--- a/fs/btrfs/compat.h
+++ b/fs/btrfs/compat.h
@@ -46,7 +46,7 @@
  /*
   * Read data from device specified by @desc and @part
   *
- * U-boot equivalent of pread().
+ * U-Boot equivalent of pread().
   *
   * Return the bytes of data read.
   * Return <0 for error.
diff --git a/fs/btrfs/extent-io.h b/fs/btrfs/extent-io.h
index 6b0c87da969d..5c5c579d1eaf 100644
--- a/fs/btrfs/extent-io.h
+++ b/fs/btrfs/extent-io.h
@@ -8,7 +8,7 @@
   *   Use pointer to provide better alignment.
   * - Remove max_cache_size related interfaces
   *   Includes free_extent_buffer_nocache()
- *   As we don't cache eb in U-boot.
+ *   As we don't cache eb in U-Boot.
   * - Include headers
   *
   * Write related functions are kept as we still need to modify dummy extent


For the btrfs part:

Reviewed-by: Qu Wenruo 

Thanks,
Qu


Re: [PATCH U-BOOT v2] btrfs: fix offset when reading compressed extents

2023-04-18 Thread Qu Wenruo




On 2023/4/18 14:41, Dominique Martinet wrote:

From: Dominique Martinet 

btrfs_read_extent_reg correctly computed the extent offset in the
BTRFS_COMPRESS_NONE case, but did not account for the 'offset - key.offset'
part correctly in the compressed case, making the function read
incorrect data.

In the case I examined, the last 4k of a file was corrupted and
contained data from a few blocks prior, e.g. reading a 10k file with a
single extent:
btrfs_file_read()
  -> btrfs_read_extent_reg
 (aligned part loop, until 8k)
  -> read_and_truncate_page
-> btrfs_read_extent_reg
   (re-reads the last extent from 8k to the end,
   incorrectly reading the first 2k of data)

This can be reproduced as follow:
$ truncate -s 200M btr
$ mount btr -o compress /mnt
$ pat() { dd if=/dev/zero bs=1M count=$1 iflag=count_bytes status=none | tr '\0' 
"\\$2"; }
$ { pat 4K 1; pat 4K 2; pat 2K 3; }  > /mnt/file
$ sync
$ filefrag -v /mnt/file
File size of /mnt/file is 10240 (3 blocks of 4096 bytes)
  ext: logical_offset:physical_offset: length:   expected: flags:
0:0..   2:   3328..  3330:  3: 
last,encoded,eof
$ umount /mnt

Then in u-boot:
=> load scsi 0 200 file
10240 bytes read in 3 ms (3.3 MiB/s)
=> md 2001ff0
02001ff0: 02020202 02020202 02020202 02020202  
02002000: 01010101 01010101 01010101 01010101  
02002010: 01010101 01010101 01010101 01010101  

(02002000 onwards should contain '03' pattern but went back to 01,
start of the extent)

After patch, data is read properly:
=> md 2001ff0
02001ff0: 02020202 02020202 02020202 02020202  
02002000: 03030303 03030303 03030303 03030303  
02002010: 03030303 03030303 03030303 03030303  

Note that the code previously (before commit e3427184f38a ("fs: btrfs:
Implement btrfs_file_read()")) did not split that read in two, so
this is a regression even if the previous code might not have been
handling offsets correctly either (something that booted now fails to
boot)

Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and 
btrfs_read_extent_reg()")
Signed-off-by: Dominique Martinet 


Reviewed-by: Qu Wenruo 

Thanks,
Qu


---
Changes in v2:
- Keep offset decomposition explicit where it is used
- Add reproducer/clarify explanation in commit message
- Drop other patches temporarily
- Link to v1: 
https://lore.kernel.org/r/20230418-btrfs-extent-reads-v1-0-47ba9839f...@codewreck.org
---
  fs/btrfs/inode.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 40025662f250..38e285bf94b0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -511,7 +511,9 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
if (ret < dsize)
memset(dbuf + ret, 0, dsize - ret);
/* Then copy the needed part */
-   memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len);
+   memcpy(dest,
+  dbuf + btrfs_file_extent_offset(leaf, fi) + offset - key.offset,
+  len);
ret = len;
  out:
free(cbuf);

---
base-commit: 5db4972a5bbdbf9e3af48ffc9bc4fec73b7b6a79
change-id: 20230418-btrfs-extent-reads-e2df6e329ad4

Best regards,


Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end

2023-04-18 Thread Qu Wenruo




On 2023/4/18 11:53, Dominique Martinet wrote:

Qu Wenruo wrote on Tue, Apr 18, 2023 at 11:21:00AM +0800:

No, was just thinking the leading part being a separate loop doesn't
seem to make sense either as the code shouldn't care about sector size
alignemnt but about full extents.


The main concern related to the leading unaligned part is, we need to skip
something unaligned from the beginning , while all other situations never
need to skip such case (they at most skip the tailing part).


Ok, there is one exception for inline extents apparently.. But I'm not
still not convinced the `aligned_start != file_offset` check is enough
for that either; I'd say it's unlikely but the inline part can be
compressed, so we could have a file which has > 4k (sectorsize) of
expanded data, so a read from the 4k offset would skip the special
handling and fail (reading the whole extent in dest)


Btrfs inline has a limit to sectorsize.

That means, inlined compressed extent can at most be 4K sized (if 4K is 
our sector size).


So that won't be a problem.



Even if that's not possible, reading just the first 10 bytes of an
inline extent will be aligned and go through the main loop which just
reads the whole extent, so it'll need the same handling as the regular
btrfs_read_extent_reg handling at which point it might just as well
handle start offset too.


If we just read 10 bytes, the aligned part should handle it well.

My real concern is what if we read 10 bytes at offset 10 bytes.

If this can be handled in the same way of aligned read (and still be 
reasonable readable), then it would be awesome.





That aside taking the loop in order:
- lookup_data_extent doesn't care (used in heading/tail)
- skipping holes don't care as they explicitely put cursor at start of
next extent (or bail out if nothing next)
- inline needs fixing anyway as said above
- PREALLOC or nothing on disk also goes straight to next and is ok
- ah, I see what you meant now, we need to substract the current
position within the extent to extent_num_bytes...
That's also already a problem, though; to take the same example:
0 8k   16k
[extent1  | extent2 ... ]
reading from 4k onwards will try to read
min(extent_num_bytes, end-cur) = min(8k, 12k) = 8k
from the 4k offset which goes over the end of the extent.


That's indeed a problem.

As most of the Uboot fs drivers only care read the whole file, never 
really utilize the ability to read part of the file, that path is not 
properly tested.

(Some driver, IIRC ubifs?, doesn't even allow read with non-zero offset)

Thanks,
Qu



That could actually be my resets from the previous mail.


So either the first check should just lookup the extent and check that
extent start matches current offset instead of checking for sectorsize
alignment, or we can just fix the loop and remove the first if.



Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end

2023-04-17 Thread Qu Wenruo




On 2023/4/18 11:07, Dominique Martinet wrote:

Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:53:41AM +0800:

I have a feeling the loop could just be updated to go to the end
`while (cur < end)` as it doesn't seem to care about the end
alignment... Should I update v2 to do that instead?


Yeah, it would be very awesome if you can remove the tailing part
completely.


Ok, will give it a try.
I'll want to test this a bit so might take a day or two as I have other
work to finish first.


This made me look at the "Read out the leading unaligned part" initial
part, and its check only looks at the sector size but it probably has
the same problem -- we want to make sure we read any leftover from a
previous extent e.g. this file:


If you're talking about the same problem mentioned in patch 1, then yes, any
read in the middle of an extent would cause problems.


No, was just thinking the leading part being a separate loop doesn't
seem to make sense either as the code shouldn't care about sector size
alignemnt but about full extents.


The main concern related to the leading unaligned part is, we need to 
skip something unaligned from the beginning , while all other situations 
never need to skip such case (they at most skip the tailing part).


Maybe we can do some extra calculation to handle it, but I'm not in a 
hurry to address the leading part.


Another thing is, for all other fs-ish interface (kernel/fuse etc), they 
all read/write in certain block size, then handle unaligned part from a 
higher layer.
Thus I prefer to have most things handled in an aligned fashion, and 
only handle the unaligned part specially.


But if you can find an elegant way to handle all cases, that would be 
really awesome!


Thanks,
Qu


If the main loop handles everything correctly then the leading if can
also be removed along with "read_and_truncate_page" that would no longer
be used.

I'll give this a try as well and report back.


No matter if it's aligned or not, as btrfs would convert any unaligned part
into aligned read.


Yes I don't see where it would fail (except that my board does crash),
I guess that at this point I should spend some time building a qemu
u-boot and hooking up gdb will be faster..


My initial plan is to merge the tailing part into the aligned loop, but
since you're already working in this part, feel free to do it.


Yes, sure -- removing the if is easy, I'd just rather not make it fail
for someone else :)



Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end

2023-04-17 Thread Qu Wenruo




On 2023/4/18 10:41, Dominique Martinet wrote:

Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:02:00AM +0800:

/* Read the tailing unaligned part*/


Can we remove this part completely?

IIRC if we read until the target end, the unaligned end part can be
completely removed then.


The "Read the aligned part" loop stops at aligned_end:

while (cur < aligned_end)


So it should be possible that the last aligned extent we consider does
not contain data until the end, e.g. an offset that ends with the
aligned end:

040964123
[extent1-|extent2]

In this case the main loop will only read extent1 and we need the
"trailing unaligned part" if for extent2.

I have a feeling the loop could just be updated to go to the end
`while (cur < end)` as it doesn't seem to care about the end
alignment... Should I update v2 to do that instead?


Yeah, it would be very awesome if you can remove the tailing part 
completely.






This made me look at the "Read out the leading unaligned part" initial
part, and its check only looks at the sector size but it probably has
the same problem -- we want to make sure we read any leftover from a
previous extent e.g. this file:


If you're talking about the same problem mentioned in patch 1, then yes, 
any read in the middle of an extent would cause problems.


No matter if it's aligned or not, as btrfs would convert any unaligned 
part into aligned read.


But if we fix the bug you mentioned, I see nothing can going wrong.

Unaligned read is converted into aligned one (then copy the target range 
into the dest buf), and aligned part (including the tailing unaligned 
part) would be properly handled then.



0  4096   8192
[extent1--|extent2]
and reading from 4k with a sectorsize of 4k is probably bad will enter
the aligned main loop right away...
And I think that'll fail?...
Actually not quite sure what's expecting what to be aligned in there,
but I just tried some partial reads from non-zero offsets and my board
resets almost all the time so I guess I've found something else to dig
into.
This isn't a priority for me right now but I'll look a bit more when I
have more time if you haven't first.

My initial plan is to merge the tailing part into the aligned loop, but 
since you're already working in this part, feel free to do it.


And I'm very happy to provide any help on this.

Thanks,
Qu


Re: [PATCH U-BOOT 1/3] btrfs: fix offset within btrfs_read_extent_reg()

2023-04-17 Thread Qu Wenruo




On 2023/4/18 10:05, Dominique Martinet wrote:

Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800:

The subject can be changed to "btrfs: fix offset when reading compressed
extents".
The original one is a little too generic.


Ok.


btrfs_file_read()
   -> btrfs_read_extent_reg
  (aligned part loop from 0x4048 to 0x40ba, 128KB at a time)
   last read had 0x4000 bytes in extent, but aligned_end - cur limited
   the read to 0x3000 (offset 0x72)
   -> read_and_truncate_page
 -> btrfs_read_extent_reg
reading the last 0x1000 bytes from offset 0x73000 (end of the
previous 0x4000) into 0x40ba3000
here key.offset = 0x7 so we need to use it to recover the
0x3000 offset.


You can use "btrfs ins dump-tree" to provide a much easier to read on-disk
data layout.


key.offset is the offset within the read call, not the offset on disk.
The file on disk looks perfectly normal, the condition to trigger the
bug is to have a file which size is not sector-aligned and where the
last extent is bigger than a sector.


Forgot to mention, this bug does not only affect the mentioned case, it 
affects all partial read of an extent.


Even if it's sector aligned.



I had a look at dump-tree early on and couldn't actually find my file in
there, now the problem is understood it should be easy to make a
reproducer so I'll add this info and commands needed to reproduce (+ a
link to a fs image just in case) in v2
  

/* Preallocated or hole , fill @dest with zero */
if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
@@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
u64 logical;
-   logical = btrfs_file_extent_disk_bytenr(leaf, fi) +
- btrfs_file_extent_offset(leaf, fi) +
- offset - key.offset;
+   logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;


This is touching non-compressed path, which is very weird as your commit
message said this part is correct.


my rationale is that since this was considered once but forgotten the
other time it'll be easy to add yet another code path that forgets it
later, but I guess it won't change much anyway -- I'll fix the patch
making it explicit again.


Thanks,


Re: [PATCH U-BOOT 1/3] btrfs: fix offset within btrfs_read_extent_reg()

2023-04-17 Thread Qu Wenruo




On 2023/4/18 10:05, Dominique Martinet wrote:

Qu Wenruo wrote on Tue, Apr 18, 2023 at 09:58:37AM +0800:

The subject can be changed to "btrfs: fix offset when reading compressed
extents".
The original one is a little too generic.


Ok.


btrfs_file_read()
   -> btrfs_read_extent_reg
  (aligned part loop from 0x4048 to 0x40ba, 128KB at a time)
   last read had 0x4000 bytes in extent, but aligned_end - cur limited
   the read to 0x3000 (offset 0x72)
   -> read_and_truncate_page
 -> btrfs_read_extent_reg
reading the last 0x1000 bytes from offset 0x73000 (end of the
previous 0x4000) into 0x40ba3000
here key.offset = 0x7 so we need to use it to recover the
0x3000 offset.


You can use "btrfs ins dump-tree" to provide a much easier to read on-disk
data layout.


key.offset is the offset within the read call, not the offset on disk.
The file on disk looks perfectly normal, the condition to trigger the
bug is to have a file which size is not sector-aligned and where the
last extent is bigger than a sector.


Yes, I understand the runtime offset is involved.

But you're still trying to explain the situation with involved on-disk 
file extent, right?


In that case it's way easier to go something like this:

  We can have a compressed file extent like this:

item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
generation 7 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 32768 ram 32768
extent compression 1 (zlib)

  Then if we try to read file range [4K, 8K) of inode 257 in Uboot, then
  we got corrupted data.

At least that's the preferred way to explain in btrfs community (with 
on-disk thus static part, then runtime part).




I had a look at dump-tree early on and couldn't actually find my file in
there, now the problem is understood it should be easy to make a
reproducer so I'll add this info and commands needed to reproduce (+ a
link to a fs image just in case) in v2


The reproducer is super easy.

 # mkfs.btrfs -f $dev
 # mount $dev -o compress $mnt
 # xfs_io -f -c "pwrite -i /dev/urandom 0 16K" $mnt/file
 # umount $mnt

Then in uboot
 # load host 0 $addr file 0x1000 0x1000
 # md $addr

And compare to regular xxd from kernel.

  

/* Preallocated or hole , fill @dest with zero */
if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
@@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
u64 logical;
-   logical = btrfs_file_extent_disk_bytenr(leaf, fi) +
- btrfs_file_extent_offset(leaf, fi) +
- offset - key.offset;
+   logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;


This is touching non-compressed path, which is very weird as your commit
message said this part is correct.


my rationale is that since this was considered once but forgotten the
other time it'll be easy to add yet another code path that forgets it
later, but I guess it won't change much anyway -- I'll fix the patch
making it explicit again.


OK, that makes sense now.

Thanks,
Qu




Thanks,


Re: [PATCH U-BOOT 3/3] btrfs: btfs_file_read: zero trailing data if no extent was found

2023-04-17 Thread Qu Wenruo




On 2023/4/18 09:17, Dominique Martinet wrote:

From: Dominique Martinet 

btfs_file_read's truncate path has a comment noting '>0 means no extent'
and bailing out immediately, but the buffer has not been written so
probably needs zeroing out.

This is a theorical fix only and hasn't been tested on a file that
actually runs this code path.


IIRC there is a memset() at the very beginning of btrfs_file_read() to 
set the whole dest memory to zero.


This is to handle cases like NO_HOLE cases, which we can skip hole extents.

Thanks,
Qu


Signed-off-by: Dominique Martinet 
---
  fs/btrfs/inode.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index efffec0f2e68..23c006c98c3b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -756,9 +756,12 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
btrfs_release_path();
ret = lookup_data_extent(root, , ino, cur,
 _offset);
-   /* <0 is error, >0 means no extent */
-   if (ret)
+   /* <0 is error, >0 means no extent: zero end of buffer */
+   if (ret) {
+   if (ret > 0)
+   memset(dest + cur, 0, end - cur);
goto out;
+   }
fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
struct btrfs_file_extent_item);
ret = read_and_truncate_page(, fi, cur,



Re: [PATCH U-BOOT 2/3] btrfs: btrfs_file_read: allow opportunistic read until the end

2023-04-17 Thread Qu Wenruo




On 2023/4/18 09:17, Dominique Martinet wrote:

From: Dominique Martinet 

btrfs_file_read main 'aligned read' loop would limit the last read to
the aligned end even if the data is present in the extent: there is no
reason not to read the data that we know will be presented correctly.

If that somehow fails cur only advances up to the aligned_end anyway and
the file tail is read through the old code unchanged; this could be
required if e.g. the end data is inlined.

Signed-off-by: Dominique Martinet 
---
  fs/btrfs/inode.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3d6e39e6544d..efffec0f2e68 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -663,7 +663,8 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
struct btrfs_path path;
struct btrfs_key key;
u64 aligned_start = round_down(file_offset, fs_info->sectorsize);
-   u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize);
+   u64 end = file_offset + len;
+   u64 aligned_end = round_down(end, fs_info->sectorsize);
u64 next_offset;
u64 cur = aligned_start;
int ret = 0;
@@ -743,26 +744,26 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
extent_num_bytes = btrfs_file_extent_num_bytes(path.nodes[0],
   fi);
ret = btrfs_read_extent_reg(, fi, cur,
-   min(extent_num_bytes, aligned_end - cur),
+   min(extent_num_bytes, end - cur),
dest + cur - file_offset);
if (ret < 0)
goto out;
-   cur += min(extent_num_bytes, aligned_end - cur);
+   cur += min(extent_num_bytes, end - cur);
}
  
  	/* Read the tailing unaligned part*/


Can we remove this part completely?

IIRC if we read until the target end, the unaligned end part can be 
completely removed then.


Thanks,
Qu

-   if (file_offset + len != aligned_end) {
+   if (file_offset + len != cur) {
btrfs_release_path();
-   ret = lookup_data_extent(root, , ino, aligned_end,
+   ret = lookup_data_extent(root, , ino, cur,
 _offset);
/* <0 is error, >0 means no extent */
if (ret)
goto out;
fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
struct btrfs_file_extent_item);
-   ret = read_and_truncate_page(, fi, aligned_end,
-   file_offset + len - aligned_end,
-   dest + aligned_end - file_offset);
+   ret = read_and_truncate_page(, fi, cur,
+   file_offset + len - cur,
+   dest + cur - file_offset);
}
  out:
btrfs_release_path();



Re: [PATCH U-BOOT 1/3] btrfs: fix offset within btrfs_read_extent_reg()

2023-04-17 Thread Qu Wenruo




On 2023/4/18 09:17, Dominique Martinet wrote:

From: Dominique Martinet 


The subject can be changed to "btrfs: fix offset when reading compressed 
extents".

The original one is a little too generic.



btrfs_read_extent_reg correctly computed the extent offset in the
BTRFS_COMPRESS_NONE case, but did not account for the 'offset - key.offset'
part correctly in the compressed case, making the function read
incorrect data.

In the case I examined, the last 4k of a file was corrupted and
contained data from a few blocks prior:
btrfs_file_read()
  -> btrfs_read_extent_reg
 (aligned part loop from 0x4048 to 0x40ba, 128KB at a time)
  last read had 0x4000 bytes in extent, but aligned_end - cur limited
  the read to 0x3000 (offset 0x72)
  -> read_and_truncate_page
-> btrfs_read_extent_reg
   reading the last 0x1000 bytes from offset 0x73000 (end of the
   previous 0x4000) into 0x40ba3000
   here key.offset = 0x7 so we need to use it to recover the
   0x3000 offset.


You can use "btrfs ins dump-tree" to provide a much easier to read 
on-disk data layout.


And you can also add a smaller reproducer.



Confirmed by checking data, before patch:
u-boot=> mmc load 1:1 $loadaddr boot/uImage
u-boot=> md 0x40ba
40ba: c0232ff8 c0c722cb 030e94be bf1c./#.."..
u-boot=> md 0x40ba3000
40ba3000: c0232ff8 c0c722cb 030e94be bf1c./#.."..

After patch (also confirmed with data read from linux):
u-boot=> md 0x40ba3000
40ba3000: 64cc9f03 81142256 691c 0c03483c...dV".i
---
  fs/btrfs/inode.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 40025662f250..3d6e39e6544d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -443,6 +443,8 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
   IS_ALIGNED(len, fs_info->sectorsize));
ASSERT(offset >= key.offset &&
   offset + len <= key.offset + extent_num_bytes);
+   /* offset is now offset within extent */
+   offset = btrfs_file_extent_offset(leaf, fi) + offset - key.offset;


I prefer not to use the @offset, explained later.

  
  	/* Preallocated or hole , fill @dest with zero */

if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_PREALLOC ||
@@ -454,9 +456,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
if (btrfs_file_extent_compression(leaf, fi) == BTRFS_COMPRESS_NONE) {
u64 logical;
  
-		logical = btrfs_file_extent_disk_bytenr(leaf, fi) +

- btrfs_file_extent_offset(leaf, fi) +
- offset - key.offset;
+   logical = btrfs_file_extent_disk_bytenr(leaf, fi) + offset;


This is touching non-compressed path, which is very weird as your commit 
message said this part is correct.


I prefer the original one to show everything we need to take into 
consideration.


Otherwise looks good to me.

Thanks,
Qu

read = len;
  
  		num_copies = btrfs_num_copies(fs_info, logical, len);

@@ -511,7 +511,7 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
if (ret < dsize)
memset(dbuf + ret, 0, dsize - ret);
/* Then copy the needed part */
-   memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len);
+   memcpy(dest, dbuf + offset, len);
ret = len;
  out:
free(cbuf);



[PATCH] fs: btrfs: limit the mapped length to the original length

2023-02-12 Thread Qu Wenruo
[BUG]
There is a bug report that btrfs driver caused hang during file read:

  This breaks btrfs on the HiFive Unmatched.

  => pci enum
  PCIE-0: Link up (Gen1-x8, Bus0)
  => nvme scan
  => load nvme 0:2 0x8c00 /boot/dtb/sifive/hifive-unmatched-a00.dtb
  [hangs]

[CAUSE]
The reporter provided some debug output:

  read_extent_data: cur=615817216, orig_len=16384, cur_len=16384
  read_extent_data: btrfs_map_block: cur_len=479944704; ret=0
  read_extent_data: ret=0
  read_extent_data: cur=615833600, orig_len=4096, cur_len=4096
  read_extent_data: btrfs_map_block: cur_len=479928320; ret=0

Note the second and the last line, the @cur_len is 450+MiB, which is
almost a chunk size.

And inside __btrfs_map_block(), we limits the returned value to stripe
length, but that's depending on the chunk type:

if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
 BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4 |
 BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 |
 BTRFS_BLOCK_GROUP_RAID10 |
 BTRFS_BLOCK_GROUP_DUP)) {
/* we limit the length of each bio to what fits in a stripe */
*length = min_t(u64, ce->size - offset,
  map->stripe_len - stripe_offset);
} else {
*length = ce->size - offset;
}

This means, if the chunk is SINGLE profile, then we don't limit the
returned length at all, and even for other profiles, we can still return
a length much larger than the requested one.

[FIX]
Properly clamp the returned length, preventing it from returning a much
larger range than expected.

Reported-by: Andreas Schwab 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4aaaeab663f5..7d4095d9ca88 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -956,6 +956,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
struct btrfs_mapping_tree *map_tree = _info->mapping_tree;
struct cache_extent *ce;
struct map_lookup *map;
+   u64 orig_len = *length;
u64 offset;
u64 stripe_offset;
u64 *raid_map = NULL;
@@ -1047,6 +1048,7 @@ again:
} else {
*length = ce->size - offset;
}
+   *length = min_t(u64, *length, orig_len);
 
if (!multi_ret)
goto out;
-- 
2.39.1



Re: [PATCH] fs/btrfs: handle data extents, which crosss stripe boundaries, correctly

2023-02-12 Thread Qu Wenruo




On 2023/2/13 00:20, Andreas Schwab wrote:

When I print ce->size in __btrfs_map_block, it is almost always
1073741824, which looks bogus.


Can you provide the image of that filesystem?

Thanks,
Qu


Re: Possible bug in BTRFS w/ Duplication

2022-12-30 Thread Qu Wenruo




On 2022/12/30 23:28, Sam Winchenbach wrote:

I believe you fixed the issue with the patch you presented. I was in the 
process of testing a similar fix for release and it solved the issue I 
encountered.


But I still want to make sure that only RAID0 on single device can cause 
the problem.


For multi-device RAID0/RAID10/RAID5/6, we don't support them until we 
can scan all devices in U-boot...


Thanks,
Qu


Thanks,
Sam Winchenbach

-Original Message-
From: U-Boot  On Behalf Of Qu Wenruo
Sent: Thursday, December 29, 2022 7:01 PM
To: Heinrich Schuchardt ; Sam Winchenbach 
; Marek Behún 
Cc: u-boot@lists.denx.de; Qu Wenruo ; linux-bt...@vger.kernel.org
Subject: Re: Possible bug in BTRFS w/ Duplication



On 2022/12/29 22:12, Heinrich Schuchardt wrote:

On 12/28/22 21:51, Sam Winchenbach wrote:

Hello,

Hello, I have hit the following situation when trying to load files
from a BTRFS partition with duplication enabled.


You mean multi-device?

For DUP/RAID1 duplication, they don't have stripe limitation at all.

Thus I believe you're talking about RAID0 (which doesn't have any 
duplication/extra mirrors) or RAID10 or RAID5/6?

But for now, we don't support multi-device in U-boot yet, thus I'm not sure 
what situation you're talking about.

Mind to run the following command?

   # btrfs fi usage 



In the first example I read a 16KiB file - __btrfs_map_block()
changes the length to something larger than the file being read. This
works fine, as length is later clamped to the file size.

In the second example, __btrfs_map_block() changes the length
parameter to something smaller than the file (the size of a stripe).
This seems to break this check here:

  read = len;
  num_copies = btrfs_num_copies(fs_info, logical, len);
  for (i = 1; i <= num_copies; i++) {
  ret = read_extent_data(fs_info, dest, logical, , i);
  if (ret < 0 || read != len) {
  continue;
  }
  finished = true;
  break;
  }

The problem being that read is always less than len.

I am not sure if __btrfs_map_block is changing "len" to the incorrect
value, or if there is some logic in "read_extent_data" that isn't
correct. Any pointers on how this code is supposed to work would be
greatly appreciated.
Thanks.


Thanks for reporting the issue

$ scripts/get_maintainer.pl -f fs/btrfs/volumes.c

suggests to include

"Marek Behún"  (maintainer:BTRFS) Qu Wenruo
 (reviewer:BTRFS) linux-bt...@vger.kernel.org

to the communication.

Best regards

Heinrich



=== EXAMPLE 2 ===
Zynq> load mmc 1:0 0 16K
[btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part ===
[btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data
(ret = 0, read = 16384, len = 16384)
[read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block
(len = 16384) [read_extent_data,fs/btrfs/disk-io.c:550] after
__btrfs_map_block (len = 28672)
[read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len
= 16384) [read_extent_data,fs/btrfs/disk-io.c:568] after
__btrfs_devread (len =
16384)
[btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data
(ret = 0, read = 16384, len = 16384)
cur: 0, extent_num_bytes: 16384, aligned_end: 16384
16384 bytes read in 100 ms (159.2 KiB/s)

=== EXAMPLE 2 ===
Zynq> load mmc 1:0 0 32K
[btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part ===
[btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data
(ret = 0, read = 32768, len = 32768)
[read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block
(len = 32768) [read_extent_data,fs/btrfs/disk-io.c:550] after
__btrfs_map_block (len = 12288)
[read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len
= 12288) [read_extent_data,fs/btrfs/disk-io.c:568] after
__btrfs_devread (len =
12288)


So the first 3 sectors are before the stripe boundary and we read it correctly.


[btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data
(ret = 0, read = 12288, len = 32768)
[btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data
(ret = 0, read = 12288, len = 32768)
[read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block
(len = 12288) [read_extent_data,fs/btrfs/disk-io.c:550] after
__btrfs_map_block (len = 12288)


I believe this is the problem.

If we're reading the full 32K, and the first 12K is in the first stripe, we 
should then try to map the remaining 20K, not the 12K again.

I'll look into the situation.
But if you can provide the image or the dump, it can greatly help the debugging.

Thanks,
Qu


[read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len
= 12288) [read_extent_data,fs/btrfs/disk-io.c:568] after
__btrfs_devread (len =
12288)
[btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data
(ret = 0, read = 12288, len = 32768)
file: fs/btrfs/inode.c, line: 468
cur: 0, extent_num_bytes: 32768, aligned_end: 32768
-> btrfs_read_extent_reg: -5, line: 75

[PATCH] fs/btrfs: handle data extents, which crosss stripe boundaries, correctly

2022-12-29 Thread Qu Wenruo
[BUG]
Since btrfs supports single device RAID0 at mkfs time after btrfs-progs
v5.14, if we create a single device raid0 btrfs, and created a file
crossing stripe boundary:

  # mkfs.btrfs -m dup -d raid0 test.img
  # mount test.img mnt
  # xfs_io -f -c "pwrite 0 128K" mnt/file
  # umount mnt

Since btrfs is using 64K as stripe length, above 128K data write is
definitely going to cross at least one stripe boundary.

Then u-boot would fail to read above 128K file:

 => host bind 0 /home/adam/test.img
 => ls host 0
 <   > 131072  Fri Dec 30 00:18:25 2022  file
 => load host 0 0 file
 BTRFS: An error occurred while reading file file
 Failed to load 'file'

[CAUSE]
Unlike tree blocks read, data extent reads doesn't consider cases in which
one data extent can cross stripe boundary.

In read_data_extent(), we just call btrfs_map_block() once and read the
first mapped range.

And if the first mapped range is smaller than the desired range, it
would return error.

But since even single device btrfs can utilize RAID0 profiles, the first
mapped range can only be at most 64K for RAID0 profiles, and cause false
error.

[FIX]
Just like read_whole_eb(), we should call btrfs_map_block() in a loop
until we read all data.

Since we're here, also add extra error messages for the following cases:

- btrfs_map_block() failure
  We already have the error message for it.

- Missing device
  This should not happen, as we only support single device for now.

- __btrfs_devread() failure

With this bug fixed, btrfs driver of u-boot can properly read the above
128K file, and have the correct content:

 => host bind 0 /home/adam/test.img
 => ls host 0
 <   > 131072  Fri Dec 30 00:18:25 2022  file
 => load host 0 0 file
 131072 bytes read in 0 ms
 => md5sum 0 0x2
 md5 for  ... 0001 ==> d48858312a922db7eb86377f638dbc9f
 ^^^ Above md5sum also matches.

Reported-by: Sam Winchenbach 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/disk-io.c | 49 +-
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f0d9f1c113b..7eaa7e949604 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -541,34 +541,39 @@ struct extent_buffer* read_tree_block(struct 
btrfs_fs_info *fs_info, u64 bytenr,
 int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical,
 u64 *len, int mirror)
 {
-   u64 offset = 0;
+   u64 orig_len = *len;
+   u64 cur = logical;
struct btrfs_multi_bio *multi = NULL;
struct btrfs_device *device;
int ret = 0;
-   u64 max_len = *len;
 
-   ret = btrfs_map_block(fs_info, READ, logical, len, , mirror,
- NULL);
-   if (ret) {
-   fprintf(stderr, "Couldn't map the block %llu\n",
-   logical + offset);
-   goto err;
-   }
-   device = multi->stripes[0].dev;
+   while (cur < logical + orig_len) {
+   u64 cur_len = logical + orig_len - cur;
 
-   if (*len > max_len)
-   *len = max_len;
-   if (!device->desc || !device->part) {
-   ret = -EIO;
-   goto err;
-   }
-
-   ret = __btrfs_devread(device->desc, device->part, data, *len,
- multi->stripes[0].physical);
-   if (ret != *len)
-   ret = -EIO;
-   else
+   ret = btrfs_map_block(fs_info, READ, cur, _len, ,
+ mirror, NULL);
+   if (ret) {
+   error("Couldn't map the block %llu", cur);
+   goto err;
+   }
+   device = multi->stripes[0].dev;
+   if (!device->desc || !device->part) {
+   error("devid %llu is missing", device->devid);
+   ret = -EIO;
+   goto err;
+   }
+   ret = __btrfs_devread(device->desc, device->part,
+   data + (cur - logical), cur_len,
+   multi->stripes[0].physical);
+   if (ret != cur_len) {
+   error("read failed on devid %llu physical %llu",
+ device->devid, multi->stripes[0].physical);
+   ret = -EIO;
+   goto err;
+   }
+   cur += cur_len;
ret = 0;
+   }
 err:
kfree(multi);
return ret;
-- 
2.39.0



Re: Possible bug in BTRFS w/ Duplication

2022-12-29 Thread Qu Wenruo




On 2022/12/29 22:12, Heinrich Schuchardt wrote:

On 12/28/22 21:51, Sam Winchenbach wrote:

Hello,

Hello, I have hit the following situation when trying to load files 
from a BTRFS partition with duplication enabled.


You mean multi-device?

For DUP/RAID1 duplication, they don't have stripe limitation at all.

Thus I believe you're talking about RAID0 (which doesn't have any 
duplication/extra mirrors) or RAID10 or RAID5/6?


But for now, we don't support multi-device in U-boot yet, thus I'm not 
sure what situation you're talking about.


Mind to run the following command?

 # btrfs fi usage 



In the first example I read a 16KiB file - __btrfs_map_block() changes 
the length to something larger than the file being read. This works 
fine, as length is later clamped to the file size.


In the second example, __btrfs_map_block() changes the length 
parameter to something smaller than the file (the size of a stripe).  
This seems to break this check here:


 read = len;
 num_copies = btrfs_num_copies(fs_info, logical, len);
 for (i = 1; i <= num_copies; i++) {
 ret = read_extent_data(fs_info, dest, logical, , i);
 if (ret < 0 || read != len) {
 continue;
 }
 finished = true;
 break;
 }

The problem being that read is always less than len.

I am not sure if __btrfs_map_block is changing "len" to the incorrect 
value, or if there is some logic in "read_extent_data" that isn't 
correct. Any pointers on how this code is supposed to work would be 
greatly appreciated.

Thanks.


Thanks for reporting the issue

$ scripts/get_maintainer.pl -f fs/btrfs/volumes.c

suggests to include

"Marek Behún"  (maintainer:BTRFS)
Qu Wenruo  (reviewer:BTRFS)
linux-bt...@vger.kernel.org

to the communication.

Best regards

Heinrich



=== EXAMPLE 2 ===
Zynq> load mmc 1:0 0 16K
[btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part ===
[btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data 
(ret = 0, read = 16384, len = 16384)
[read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block 
(len = 16384)
[read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len 
= 28672)
[read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len 
= 16384)
[read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 
16384)
[btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data 
(ret = 0, read = 16384, len = 16384)

cur: 0, extent_num_bytes: 16384, aligned_end: 16384
16384 bytes read in 100 ms (159.2 KiB/s)

=== EXAMPLE 2 ===
Zynq> load mmc 1:0 0 32K
[btrfs_file_read,fs/btrfs/inode.c:710] === read the aligned part ===
[btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data 
(ret = 0, read = 32768, len = 32768)
[read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block 
(len = 32768)
[read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len 
= 12288)
[read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len 
= 12288)
[read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 
12288)


So the first 3 sectors are before the stripe boundary and we read it 
correctly.


[btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data 
(ret = 0, read = 12288, len = 32768)
[btrfs_read_extent_reg,fs/btrfs/inode.c:458] before read_extent_data 
(ret = 0, read = 12288, len = 32768)
[read_extent_data,fs/btrfs/disk-io.c:547] before __btrfs_map_block 
(len = 12288)
[read_extent_data,fs/btrfs/disk-io.c:550] after __btrfs_map_block (len 
= 12288)


I believe this is the problem.

If we're reading the full 32K, and the first 12K is in the first stripe, 
we should then try to map the remaining 20K, not the 12K again.


I'll look into the situation.
But if you can provide the image or the dump, it can greatly help the 
debugging.


Thanks,
Qu

[read_extent_data,fs/btrfs/disk-io.c:565] before __btrfs_devread (len 
= 12288)
[read_extent_data,fs/btrfs/disk-io.c:568] after __btrfs_devread (len = 
12288)
[btrfs_read_extent_reg,fs/btrfs/inode.c:460] after read_extent_data 
(ret = 0, read = 12288, len = 32768)

file: fs/btrfs/inode.c, line: 468
cur: 0, extent_num_bytes: 32768, aligned_end: 32768
-> btrfs_read_extent_reg: -5, line: 758
BTRFS: An error occurred while reading file 32K
Failed to load '32K'





Sam Winchenbach
Embedded Software Engineer III
Tethers Unlimited, Inc. | Connect Your Universe | www.tethers.com
swinchenb...@tethers.com | C: 207-974-6934
11711 North Creek Pkwy # D113, Bothell, WA 98011-8808, USA




Re: [PATCH RESEND] fs: btrfs: remove the usage of undeclared fs_mutex variable

2022-09-28 Thread Qu Wenruo




On 2022/9/28 23:23, Pankaj Raghav wrote:

This line probably got in by mistake as there is no fs_mutex member in
the btrfs_fs_info struct.

Signed-off-by: Pankaj Raghav 


Reviewed-by: Qu Wenruo 

Thanks,
Qu


---
For UBOOT.
The patch I sent the first time is not showing up in the uboot mailing list
because of the missing approval. So, resending the patch.

  fs/btrfs/disk-io.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8043abc1bd..c80f8e8028 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void)
fs_info->fs_root_tree = RB_ROOT;
cache_tree_init(_info->mapping_tree.cache_tree);

-   mutex_init(_info->fs_mutex);
-
return fs_info;
  free_all:
btrfs_free_fs_info(fs_info);


Re: [PATCH] fs: btrfs: remove the usage of undeclared fs_mutex variable

2022-09-27 Thread Qu Wenruo




On 2022/9/27 18:49, Pankaj Raghav wrote:




Which branch is the code based on?


I actually cloned it from u-boot github master.
https://github.com/u-boot/u-boot/blob/f117c54cc83e3c519883edb5a48062644d38c443/fs/btrfs/disk-io.c#L785


I don't believe it's upstream, as such compiling error should be exposed
very easily.


Actually, I was also surprised but the compiler did not give me any error
because the mutex definition was as follows:
#define mutex_init(...)

and the compiler did not generate any error because we don't do anything
the parameter.


---
   fs/btrfs/disk-io.c | 2 --
   1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c


Furthermore at current upstream HEAD a1375562c0a8 ("Merge tag
'x86_urgent_for_v6.0-rc8' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"), there is no
btrfs_new_fs_info() function defined anyway.


Hmm, I sent these patches for u-boot. Maybe there is a misunderstanding
here? :)


Yeah, you're completely right, I thought it's for kernel by default...

Sorry for the confusion.
Qu




THanks,
Qu


index 8043abc1bd..c80f8e8028 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void)
   fs_info->fs_root_tree = RB_ROOT;
   cache_tree_init(_info->mapping_tree.cache_tree);
   -    mutex_init(_info->fs_mutex);
-
   return fs_info;
   free_all:
   btrfs_free_fs_info(fs_info);


Re: [PATCH] fs: btrfs: remove the usage of undeclared fs_mutex variable

2022-09-27 Thread Qu Wenruo




On 2022/9/27 18:39, Qu Wenruo wrote:



On 2022/9/27 17:55, Pankaj Raghav wrote:

This line probably got in by mistake as there is no fs_mutex member in
the btrfs_fs_info struct.

Signed-off-by: Pankaj Raghav 


Which branch is the code based on?

I don't believe it's upstream, as such compiling error should be exposed 
very easily.



---
  fs/btrfs/disk-io.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c


Furthermore at current upstream HEAD a1375562c0a8 ("Merge tag 
'x86_urgent_for_v6.0-rc8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"), there is no 
btrfs_new_fs_info() function defined anyway.


My bad, I didn't notice it's for Uboot, not kernel.

In that case, you can safely remove it, but it won't cause compile error 
since in , we define mutex_init() as noop, thus it 
doesn't cause any compile error.


Anyway the patch itself looks good, I also checked if there is other 
mutex related usage, and this is the only one.


Reviewed-by: Qu Wenruo 

Thanks,
Qu



THanks,
Qu


index 8043abc1bd..c80f8e8028 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void)
  fs_info->fs_root_tree = RB_ROOT;
  cache_tree_init(_info->mapping_tree.cache_tree);
-    mutex_init(_info->fs_mutex);
-
  return fs_info;
  free_all:
  btrfs_free_fs_info(fs_info);


Re: [PATCH] fs: btrfs: remove the usage of undeclared fs_mutex variable

2022-09-27 Thread Qu Wenruo




On 2022/9/27 17:55, Pankaj Raghav wrote:

This line probably got in by mistake as there is no fs_mutex member in
the btrfs_fs_info struct.

Signed-off-by: Pankaj Raghav 


Which branch is the code based on?

I don't believe it's upstream, as such compiling error should be exposed 
very easily.



---
  fs/btrfs/disk-io.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c


Furthermore at current upstream HEAD a1375562c0a8 ("Merge tag 
'x86_urgent_for_v6.0-rc8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"), there is no 
btrfs_new_fs_info() function defined anyway.


THanks,
Qu


index 8043abc1bd..c80f8e8028 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -782,8 +782,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(void)
fs_info->fs_root_tree = RB_ROOT;
cache_tree_init(_info->mapping_tree.cache_tree);
  
-	mutex_init(_info->fs_mutex);

-
return fs_info;
  free_all:
btrfs_free_fs_info(fs_info);


[PATCH v3 8/8] fs: erofs: add unaligned read range handling

2022-08-15 Thread Qu Wenruo
I'm not an expert on erofs, but my quick glance didn't expose any
special handling on unaligned range, thus I think the U-boot erofs
driver doesn't really support unaligned read range.

This patch will add erofs_get_blocksize() so erofs can benefit from the
generic unaligned read support.

Cc: Huang Jianan 
Cc: linux-er...@lists.ozlabs.org
Signed-off-by: Qu Wenruo 
---
 fs/erofs/internal.h | 1 +
 fs/erofs/super.c| 6 ++
 fs/fs.c | 2 +-
 include/erofs.h | 1 +
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4af7c91560cc..d368a6481bf1 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -83,6 +83,7 @@ struct erofs_sb_info {
u16 available_compr_algs;
u16 lz4_max_distance;
u32 checksum;
+   u32 blocksize;
u16 extra_devices;
union {
u16 devt_slotoff;   /* used for mkfs */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 8277d9b53fb3..82625da59001 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -99,7 +99,13 @@ int erofs_read_superblock(void)
 
sbi.build_time = le64_to_cpu(dsb->build_time);
sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec);
+   sbi.blocksize = 1 << blkszbits;
 
memcpy(, dsb->uuid, sizeof(dsb->uuid));
return erofs_init_devices(, dsb);
 }
+
+int erofs_get_blocksize(const char *filename)
+{
+   return sbi.blocksize;
+}
diff --git a/fs/fs.c b/fs/fs.c
index 43c7128bcfc5..2ac43c05fcd8 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -376,7 +376,7 @@ static struct fstype_info fstypes[] = {
.readdir = erofs_readdir,
.ls = fs_ls_generic,
.read = erofs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = erofs_get_blocksize,
.size = erofs_size,
.close = erofs_close,
.closedir = erofs_closedir,
diff --git a/include/erofs.h b/include/erofs.h
index 1fbe82bf72cb..18bd6807c538 100644
--- a/include/erofs.h
+++ b/include/erofs.h
@@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc,
struct disk_partition *fs_partition);
 int erofs_read(const char *filename, void *buf, loff_t offset,
   loff_t len, loff_t *actread);
+int erofs_get_blocksize(const char *filename);
 int erofs_size(const char *filename, loff_t *size);
 int erofs_exists(const char *filename);
 void erofs_close(void);
-- 
2.37.1



[PATCH v3 7/8] fs: ubifs: rely on higher layer to do unaligned read

2022-08-15 Thread Qu Wenruo
Currently ubifs doesn't support unaligned read offset, thanks to the
recent _fs_read() work to handle unaligned read, we only need to
implement ubifs_get_blocksize() to take advantage of it.

Now ubifs can do unaligned read without any problem.

Signed-off-by: Qu Wenruo 
---
Unfortunately I can not test ubifs, as enabling UBI would cause compile
failure due to missing of  header.
---
 fs/fs.c   |  2 +-
 fs/ubifs/ubifs.c  | 13 -
 include/ubifs_uboot.h |  1 +
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/fs.c b/fs/fs.c
index ea4325cd0b00..43c7128bcfc5 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -312,7 +312,7 @@ static struct fstype_info fstypes[] = {
.exists = ubifs_exists,
.size = ubifs_size,
.read = ubifs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ubifs_get_blocksize,
.write = fs_write_unsupported,
.uuid = fs_uuid_unsupported,
.opendir = fs_opendir_unsupported,
diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c
index d3026e310168..a8ab556dd376 100644
--- a/fs/ubifs/ubifs.c
+++ b/fs/ubifs/ubifs.c
@@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t 
offset,
 
*actread = 0;
 
-   if (offset & (PAGE_SIZE - 1)) {
-   printf("ubifs: Error offset must be a multiple of %d\n",
-  PAGE_SIZE);
-   return -1;
-   }
+   /* Higher layer should ensure it always pass page aligned range. */
+   assert(IS_ALIGNED(offset, PAGE_SIZE));
+   assert(IS_ALIGNED(size, PAGE_SIZE));
 
c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY);
/* ubifs_findfile will resolve symlinks, so we know that we get
@@ -920,6 +918,11 @@ out:
return err;
 }
 
+int ubifs_get_blocksize(const char *filename)
+{
+   return PAGE_SIZE;
+}
+
 void ubifs_close(void)
 {
 }
diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h
index b025779d59ff..bcd21715314a 100644
--- a/include/ubifs_uboot.h
+++ b/include/ubifs_uboot.h
@@ -29,6 +29,7 @@ int ubifs_exists(const char *filename);
 int ubifs_size(const char *filename, loff_t *size);
 int ubifs_read(const char *filename, void *buf, loff_t offset,
   loff_t size, loff_t *actread);
+int ubifs_get_blocksize(const char *filename);
 void ubifs_close(void);
 
 #endif /* __UBIFS_UBOOT_H__ */
-- 
2.37.1



[PATCH v3 6/8] fs: fat: rely on higher layer to get block aligned read range

2022-08-15 Thread Qu Wenruo
Just implement fat_get_blocksize() for fat, so that fat_read_file()
always get a block aligned read range.

Unfortunately I'm not experienced enough to cleanup the fat code, thus
further cleanup is appreciated.

Cc: Tom Rini 
Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 13 +
 fs/fs.c   |  2 +-
 include/fat.h |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index dcceccbcee0a..e13035e8e6d1 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, 
loff_t offset, loff_t len,
return ret;
 }
 
+int fat_get_blocksize(const char *filename)
+{
+   fsdata fsdata = {0};
+   int ret;
+
+   ret = get_fs_info();
+   if (ret)
+   return ret;
+
+   free(fsdata.fatbuf);
+   return fsdata.sect_size;
+}
+
 typedef struct {
struct fs_dir_stream parent;
struct fs_dirent dirent;
diff --git a/fs/fs.c b/fs/fs.c
index b26cc8e2e840..ea4325cd0b00 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -208,7 +208,7 @@ static struct fstype_info fstypes[] = {
.exists = fat_exists,
.size = fat_size,
.read = fat_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = fat_get_blocksize,
 #if CONFIG_IS_ENABLED(FAT_WRITE)
.write = file_fat_write,
.unlink = fat_unlink,
diff --git a/include/fat.h b/include/fat.h
index a9756fb4cd1b..c03a2bebecef 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -201,6 +201,7 @@ int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
+int fat_get_blocksize(const char *filename);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
 
-- 
2.37.1



[PATCH v3 4/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs

2022-08-15 Thread Qu Wenruo
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code
to handle the unaligned read (aka, read range is not aligned to fs block
size).

This makes underlying fs code harder to implement, as  they have to handle
unaligned read all by themselves.

This patch will change the behavior, starting from btrfs, by moving the
unaligned read code into _fs_read().

The idea is pretty simple, if we have an unaligned read request, we
handle it in the following steps:

1. Grab the blocksize of the fs

2. Read the leading unaligned range
   We will read the block that @offset is in, and copy the
   requested part into buf.

   The the block we read covers the whole range, we just call it a day.

3. Read the remaining part
   The tailing part may be unaligned, but all fses handles the tailing
   part much easier than the leading unaligned part.

   As they just need to do a min(extent_size, start + len - cur) to
   calculate the real read size.

   In fact, for most file reading, the file size is not aligned and we
   need to handle the tailing part anyway.

There is a btrfs specific cleanup involved:

- In btrfs_file_read(), merge the tailing unaligned read into the main
  loop.
  Just reuse the existing read length calculation is enough.

- Remove read_and_truncate_page() call
  Since there is no explicit leading/tailing unaligned read anymore.

This has been tested with a proper randomly populated btrfs file, then
tried in sandbox mode with different aligned and unaligned range and
compare the output with md5sum.

Cc: Marek Behun 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c |  10 
 fs/btrfs/inode.c |  89 +++-
 fs/fs.c  | 130 ---
 include/btrfs.h  |   1 +
 4 files changed, 141 insertions(+), 89 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 74a992fa012d..ac0e972d0249 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -234,6 +234,10 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
int ret;
 
ASSERT(fs_info);
+
+   /* Higher layer has ensures it never pass unaligned offset in. */
+   ASSERT(IS_ALIGNED(offset, fs_info->sectorsize));
+
ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID,
file, , , , 40);
if (ret < 0) {
@@ -275,6 +279,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
return 0;
 }
 
+int btrfs_get_blocksize(const char *filename)
+{
+   ASSERT(current_fs_info);
+   return current_fs_info->sectorsize;
+}
+
 void btrfs_close(void)
 {
if (current_fs_info) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 40025662f250..f12be46f6262 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -617,44 +617,6 @@ check_next:
return 1;
 }
 
-static int read_and_truncate_page(struct btrfs_path *path,
- struct btrfs_file_extent_item *fi,
- int start, int len, char *dest)
-{
-   struct extent_buffer *leaf = path->nodes[0];
-   struct btrfs_fs_info *fs_info = leaf->fs_info;
-   u64 aligned_start = round_down(start, fs_info->sectorsize);
-   u8 extent_type;
-   char *buf;
-   int page_off = start - aligned_start;
-   int page_len = fs_info->sectorsize - page_off;
-   int ret;
-
-   ASSERT(start + len <= aligned_start + fs_info->sectorsize);
-   buf = malloc_cache_aligned(fs_info->sectorsize);
-   if (!buf)
-   return -ENOMEM;
-
-   extent_type = btrfs_file_extent_type(leaf, fi);
-   if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
-   ret = btrfs_read_extent_inline(path, fi, buf);
-   memcpy(dest, buf + page_off, min(page_len, ret));
-   free(buf);
-   return len;
-   }
-
-   ret = btrfs_read_extent_reg(path, fi,
-   round_down(start, fs_info->sectorsize),
-   fs_info->sectorsize, buf);
-   if (ret < 0) {
-   free(buf);
-   return ret;
-   }
-   memcpy(dest, buf + page_off, page_len);
-   free(buf);
-   return len;
-}
-
 int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
char *dest)
 {
@@ -663,7 +625,6 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
struct btrfs_path path;
struct btrfs_key key;
u64 aligned_start = round_down(file_offset, fs_info->sectorsize);
-   u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize);
u64 next_offset;
u64 cur = aligned_start;
int ret = 0;
@@ -673,34 +634,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
/* Set the whole dest all zero, so we won't nee

[PATCH v3 5/8] fs: ext4: rely on _fs_read() to handle leading unaligned block read

2022-08-15 Thread Qu Wenruo
Just add ext4_get_blocksize() and a new assert() in ext4fs_read_file().

Signed-off-by: Qu Wenruo 
---
Several cleanup candicate:

1. ext_fs->blksz is never populated, thus it's always 0
   We can not easily grab blocksize just like btrfs in this case.

   Thus we have to go the same calculation in ext4fs_read_file().

2. can not easily cleanup @skipfirst in ext4fs_read_file()
   It seems to screw up with soft link read.
---
 fs/ext4/ext4fs.c | 22 ++
 fs/fs.c  |  2 +-
 include/ext4fs.h |  1 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c
index 4c89152ce4ad..b0568e77a895 100644
--- a/fs/ext4/ext4fs.c
+++ b/fs/ext4/ext4fs.c
@@ -69,6 +69,8 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos,
short status;
struct ext_block_cache cache;
 
+   assert(IS_ALIGNED(pos, blocksize));
+
ext_cache_init();
 
/* Adjust len so it we can't read past the end of the file. */
@@ -259,6 +261,26 @@ int ext4_read_file(const char *filename, void *buf, loff_t 
offset, loff_t len,
return ext4fs_read(buf, offset, len, len_read);
 }
 
+int ext4_get_blocksize(const char *filename)
+{
+   struct ext_filesystem *fs;
+   int log2blksz;
+   int log2_fs_blocksize;
+   loff_t file_len;
+   int ret;
+
+   ret = ext4fs_open(filename, _len);
+   if (ret < 0) {
+   printf("** File not found %s **\n", filename);
+   return -1;
+   }
+   fs = get_fs();
+   log2blksz = fs->dev_desc->log2blksz;
+   log2_fs_blocksize = LOG2_BLOCK_SIZE(ext4fs_file->data) - log2blksz;
+
+   return (1 << (log2_fs_blocksize + log2blksz));
+}
+
 int ext4fs_uuid(char *uuid_str)
 {
if (ext4fs_root == NULL)
diff --git a/fs/fs.c b/fs/fs.c
index 6d43d616de4b..b26cc8e2e840 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -237,7 +237,7 @@ static struct fstype_info fstypes[] = {
.exists = ext4fs_exists,
.size = ext4fs_size,
.read = ext4_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ext4_get_blocksize,
 #ifdef CONFIG_CMD_EXT4_WRITE
.write = ext4_write_file,
.ln = ext4fs_create_link,
diff --git a/include/ext4fs.h b/include/ext4fs.h
index cb5d9cc0a5c0..0f4cf32dcc2a 100644
--- a/include/ext4fs.h
+++ b/include/ext4fs.h
@@ -161,6 +161,7 @@ int ext4fs_probe(struct blk_desc *fs_dev_desc,
 struct disk_partition *fs_partition);
 int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len,
   loff_t *actread);
+int ext4_get_blocksize(const char *filename);
 int ext4_read_superblock(char *buffer);
 int ext4fs_uuid(char *uuid_str);
 void ext_cache_init(struct ext_block_cache *cache);
-- 
2.37.1



[PATCH v3 3/8] fs: btrfs: fix a crash if specified range is beyond file size

2022-08-15 Thread Qu Wenruo
[BUG]
When try to read a range beyond file size, btrfs driver will cause
crash/segfault:

 => load host 0 $kernel_addr_r 5k_file 0 0x2000
 SEGFAULT

[CAUSE]
In btrfs_read(), if @len is 0, we will truncated it to file end, but if
file end is beyond our file size, this truncation will underflow @len,
making it -3K in this case.

And later that @len is used to memzero the output buffer, resulting
above crash.

[FIX]
Just error out if @offset is already beyond our file size.

Now it will fail properly with correct error message:

 => load host 0 $kernel_addr_r 5m_origin 0 0x2000
 BTRFS: Read range beyond file size, offset 8192 file size 5120

 Failed to load '5m_origin'

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 309cd595d37d..74a992fa012d 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -252,6 +252,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
return ret;
}
 
+   if (offset >= real_size) {
+   error("Read range beyond file size, offset %llu file size %llu",
+   offset, real_size);
+   return -EINVAL;
+   }
+
/*
 * If the length is 0 (meaning read the whole file) or the range is
 * beyond file size, truncate it to the end of the file.
-- 
2.37.1



[PATCH v3 2/8] fs: btrfs: fix a bug which no data get read if the length is not 0

2022-08-15 Thread Qu Wenruo
[BUG]
When testing with unaligned read, if a specific length is passed in,
btrfs driver will read out nothing:

 => load host 0 $kernel_addr_r 5k_file 0x1000 0
 0 bytes read in 0 ms

But if no length is passed in, it works fine, even if we pass a non-zero
length:

 => load host 0 $kernel_addr_r 5k_file 0 0x1000
 1024 bytes read in 0 ms

[CAUSE]
In btrfs_read() if we have a larger size than our file, we will try to
truncate it using the file size.

However the real file size is not initialized if @len is not zero, thus
we always truncate our length to 0, and cause the problem.

[FIX]
Fix it by just always do the file size check.

In fact btrfs_size() always follow soft link, thus it will return the
real file size correctly.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 4cdbbbe3d066..309cd595d37d 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -246,16 +246,17 @@ int btrfs_read(const char *file, void *buf, loff_t 
offset, loff_t len,
return -EINVAL;
}
 
-   if (!len) {
-   ret = btrfs_size(file, _size);
-   if (ret < 0) {
-   error("Failed to get inode size: %s", file);
-   return ret;
-   }
-   len = real_size;
+   ret = btrfs_size(file, _size);
+   if (ret < 0) {
+   error("Failed to get inode size: %s", file);
+   return ret;
}
 
-   if (len > real_size - offset)
+   /*
+* If the length is 0 (meaning read the whole file) or the range is
+* beyond file size, truncate it to the end of the file.
+*/
+   if (!len || len > real_size - offset)
len = real_size - offset;
 
ret = btrfs_file_read(root, ino, offset, len, buf);
-- 
2.37.1



[PATCH v3 1/8] fs: fat: unexport file_fat_read_at()

2022-08-15 Thread Qu Wenruo
That function is only utilized inside fat driver, unexport it.

Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 4 ++--
 include/fat.h | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index df9ea2c028fc..dcceccbcee0a 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1243,8 +1243,8 @@ out_free_itr:
return ret;
 }
 
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread)
+static int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
+   loff_t maxsize, loff_t *actread)
 {
fsdata fsdata;
fat_itr *itr;
diff --git a/include/fat.h b/include/fat.h
index bd8e450b33a3..a9756fb4cd1b 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect)
 int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
-- 
2.37.1



[PATCH v3 0/8] U-boot: fs: add generic unaligned read offset handling

2022-08-15 Thread Qu Wenruo
[CHANGELOG]
v3:
- Fix an error that we always return 0 actread bytes for unsupported fses
  For unsupported fses, we should also populate @total_read.
  Or we will just read the data but still return 0 for actually bytes.

  Now it pass all test_fs* cases.

v2
- Fix a linkage error where (U64 % U32) is called without proper helper
  Fix it with U64 & (U32 - 1), as the U32 value (@blocksize) should
  always be power of 2, thus (@blocksize - 1) is the mask we want to
  calculate the offset inside the block.

  Above change only affects the 4th patch, everything else is not
  touched.

RFC->v1:
- More (manual) testing
  Unfortunately, in the latest master (75967970850a), the fs-tests.sh
  always seems to hang at preparing the fs image.

  Thus still has to do manual testing, tested btrfs, ext4 and fat, with
  aligned and unaligned read, also added soft link read, all looks fine here.

  Extra testing is still appreciated.

- Two more btrfs specific bug fixes
  All exposed during manual tests

- Remove the tailing unaligned block handling
  In fact, all fses can easily handle such case, just a min() call is
  enough.

- Remove the support for sandboxfs
  Since it's using read() calls, really no need to do block alignment
  check.

- Enhanced blocksize check
  Ensure the returned blocksize is not only non-error, but also
  non-zero.


Qu Wenruo (8):
  fs: fat: unexport file_fat_read_at()
  fs: btrfs: fix a bug which no data get read if the length is not 0
  fs: btrfs: fix a crash if specified range is beyond file size
  fs: btrfs: move the unaligned read code to _fs_read() for btrfs
  fs: ext4: rely on _fs_read() to handle leading unaligned block read
  fs: fat: rely on higher layer to get block aligned read range
  fs: ubifs: rely on higher layer to do unaligned read
  fs: erofs: add unaligned read range handling

 fs/btrfs/btrfs.c  |  33 ---
 fs/btrfs/inode.c  |  89 +++--
 fs/erofs/internal.h   |   1 +
 fs/erofs/super.c  |   6 ++
 fs/ext4/ext4fs.c  |  22 +++
 fs/fat/fat.c  |  17 +-
 fs/fs.c   | 130 +++---
 fs/ubifs/ubifs.c  |  13 +++--
 include/btrfs.h   |   1 +
 include/erofs.h   |   1 +
 include/ext4fs.h  |   1 +
 include/fat.h |   3 +-
 include/ubifs_uboot.h |   1 +
 13 files changed, 212 insertions(+), 106 deletions(-)

-- 
2.37.1



Re: [PATCH v2 1/8] fs: fat: unexport file_fat_read_at()

2022-08-05 Thread Qu Wenruo




On 2022/8/6 05:14, Tom Rini wrote:

On Tue, Jul 26, 2022 at 01:22:09PM +0800, Qu Wenruo wrote:


That function is only utilized inside fat driver, unexport it.

Signed-off-by: Qu Wenruo 


Unfortunately, the series fails CI:
https://source.denx.de/u-boot/u-boot/-/jobs/478838



OK, it's a bug in the unsupported fses (which squashfs doesn't support)

The actual read bytes is not updated.

Sorry for the inconvenience.

Any idea that how to run the full tests locally so I can prevent such
problem?

Thanks,
Qu


[PATCH v2 8/8] fs: erofs: add unaligned read range handling

2022-07-25 Thread Qu Wenruo
I'm not an expert on erofs, but my quick glance didn't expose any
special handling on unaligned range, thus I think the U-boot erofs
driver doesn't really support unaligned read range.

This patch will add erofs_get_blocksize() so erofs can benefit from the
generic unaligned read support.

Cc: Huang Jianan 
Cc: linux-er...@lists.ozlabs.org
Signed-off-by: Qu Wenruo 
Reviewed-by: Huang Jianan 
---
 fs/erofs/internal.h | 1 +
 fs/erofs/super.c| 6 ++
 fs/fs.c | 2 +-
 include/erofs.h | 1 +
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4af7c91560cc..d368a6481bf1 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -83,6 +83,7 @@ struct erofs_sb_info {
u16 available_compr_algs;
u16 lz4_max_distance;
u32 checksum;
+   u32 blocksize;
u16 extra_devices;
union {
u16 devt_slotoff;   /* used for mkfs */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 4cca322b9ead..df01d2e719a7 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -99,7 +99,13 @@ int erofs_read_superblock(void)
 
sbi.build_time = le64_to_cpu(dsb->build_time);
sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec);
+   sbi.blocksize = 1 << blkszbits;
 
memcpy(, dsb->uuid, sizeof(dsb->uuid));
return erofs_init_devices(, dsb);
 }
+
+int erofs_get_blocksize(const char *filename)
+{
+   return sbi.blocksize;
+}
diff --git a/fs/fs.c b/fs/fs.c
index 2b847d83597b..23cfb1f5025b 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -375,7 +375,7 @@ static struct fstype_info fstypes[] = {
.readdir = erofs_readdir,
.ls = fs_ls_generic,
.read = erofs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = erofs_get_blocksize,
.size = erofs_size,
.close = erofs_close,
.closedir = erofs_closedir,
diff --git a/include/erofs.h b/include/erofs.h
index 1fbe82bf72cb..18bd6807c538 100644
--- a/include/erofs.h
+++ b/include/erofs.h
@@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc,
struct disk_partition *fs_partition);
 int erofs_read(const char *filename, void *buf, loff_t offset,
   loff_t len, loff_t *actread);
+int erofs_get_blocksize(const char *filename);
 int erofs_size(const char *filename, loff_t *size);
 int erofs_exists(const char *filename);
 void erofs_close(void);
-- 
2.37.0



[PATCH v2 7/8] fs: ubifs: rely on higher layer to do unaligned read

2022-07-25 Thread Qu Wenruo
Currently ubifs doesn't support unaligned read offset, thanks to the
recent _fs_read() work to handle unaligned read, we only need to
implement ubifs_get_blocksize() to take advantage of it.

Now ubifs can do unaligned read without any problem.

Signed-off-by: Qu Wenruo 
---
 fs/fs.c   |  2 +-
 fs/ubifs/ubifs.c  | 13 -
 include/ubifs_uboot.h |  1 +
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/fs.c b/fs/fs.c
index 3eb540c5fe30..2b847d83597b 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -311,7 +311,7 @@ static struct fstype_info fstypes[] = {
.exists = ubifs_exists,
.size = ubifs_size,
.read = ubifs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ubifs_get_blocksize,
.write = fs_write_unsupported,
.uuid = fs_uuid_unsupported,
.opendir = fs_opendir_unsupported,
diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c
index d3026e310168..a8ab556dd376 100644
--- a/fs/ubifs/ubifs.c
+++ b/fs/ubifs/ubifs.c
@@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t 
offset,
 
*actread = 0;
 
-   if (offset & (PAGE_SIZE - 1)) {
-   printf("ubifs: Error offset must be a multiple of %d\n",
-  PAGE_SIZE);
-   return -1;
-   }
+   /* Higher layer should ensure it always pass page aligned range. */
+   assert(IS_ALIGNED(offset, PAGE_SIZE));
+   assert(IS_ALIGNED(size, PAGE_SIZE));
 
c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY);
/* ubifs_findfile will resolve symlinks, so we know that we get
@@ -920,6 +918,11 @@ out:
return err;
 }
 
+int ubifs_get_blocksize(const char *filename)
+{
+   return PAGE_SIZE;
+}
+
 void ubifs_close(void)
 {
 }
diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h
index b025779d59ff..bcd21715314a 100644
--- a/include/ubifs_uboot.h
+++ b/include/ubifs_uboot.h
@@ -29,6 +29,7 @@ int ubifs_exists(const char *filename);
 int ubifs_size(const char *filename, loff_t *size);
 int ubifs_read(const char *filename, void *buf, loff_t offset,
   loff_t size, loff_t *actread);
+int ubifs_get_blocksize(const char *filename);
 void ubifs_close(void);
 
 #endif /* __UBIFS_UBOOT_H__ */
-- 
2.37.0



[PATCH v2 6/8] fs: fat: rely on higher layer to get block aligned read range

2022-07-25 Thread Qu Wenruo
Just implement fat_get_blocksize() for fat, so that fat_read_file()
always get a block aligned read range.

Unfortunately I'm not experienced enough to cleanup the fat code, thus
further cleanup is appreciated.

Cc: Tom Rini 
Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 13 +
 fs/fs.c   |  2 +-
 include/fat.h |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index dcceccbcee0a..e13035e8e6d1 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, 
loff_t offset, loff_t len,
return ret;
 }
 
+int fat_get_blocksize(const char *filename)
+{
+   fsdata fsdata = {0};
+   int ret;
+
+   ret = get_fs_info();
+   if (ret)
+   return ret;
+
+   free(fsdata.fatbuf);
+   return fsdata.sect_size;
+}
+
 typedef struct {
struct fs_dir_stream parent;
struct fs_dirent dirent;
diff --git a/fs/fs.c b/fs/fs.c
index 3d6cc6b38b26..3eb540c5fe30 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -207,7 +207,7 @@ static struct fstype_info fstypes[] = {
.exists = fat_exists,
.size = fat_size,
.read = fat_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = fat_get_blocksize,
 #if CONFIG_IS_ENABLED(FAT_WRITE)
.write = file_fat_write,
.unlink = fat_unlink,
diff --git a/include/fat.h b/include/fat.h
index a9756fb4cd1b..c03a2bebecef 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -201,6 +201,7 @@ int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
+int fat_get_blocksize(const char *filename);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
 
-- 
2.37.0



[PATCH v2 5/8] fs: ext4: rely on _fs_read() to handle leading unaligned block read

2022-07-25 Thread Qu Wenruo
Just add ext4_get_blocksize() and a new assert() in ext4fs_read_file().

Signed-off-by: Qu Wenruo 
---
 fs/ext4/ext4fs.c | 22 ++
 fs/fs.c  |  2 +-
 include/ext4fs.h |  1 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c
index 4c89152ce4ad..b0568e77a895 100644
--- a/fs/ext4/ext4fs.c
+++ b/fs/ext4/ext4fs.c
@@ -69,6 +69,8 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos,
short status;
struct ext_block_cache cache;
 
+   assert(IS_ALIGNED(pos, blocksize));
+
ext_cache_init();
 
/* Adjust len so it we can't read past the end of the file. */
@@ -259,6 +261,26 @@ int ext4_read_file(const char *filename, void *buf, loff_t 
offset, loff_t len,
return ext4fs_read(buf, offset, len, len_read);
 }
 
+int ext4_get_blocksize(const char *filename)
+{
+   struct ext_filesystem *fs;
+   int log2blksz;
+   int log2_fs_blocksize;
+   loff_t file_len;
+   int ret;
+
+   ret = ext4fs_open(filename, _len);
+   if (ret < 0) {
+   printf("** File not found %s **\n", filename);
+   return -1;
+   }
+   fs = get_fs();
+   log2blksz = fs->dev_desc->log2blksz;
+   log2_fs_blocksize = LOG2_BLOCK_SIZE(ext4fs_file->data) - log2blksz;
+
+   return (1 << (log2_fs_blocksize + log2blksz));
+}
+
 int ext4fs_uuid(char *uuid_str)
 {
if (ext4fs_root == NULL)
diff --git a/fs/fs.c b/fs/fs.c
index 1e9f778e1f11..3d6cc6b38b26 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -236,7 +236,7 @@ static struct fstype_info fstypes[] = {
.exists = ext4fs_exists,
.size = ext4fs_size,
.read = ext4_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ext4_get_blocksize,
 #ifdef CONFIG_CMD_EXT4_WRITE
.write = ext4_write_file,
.ln = ext4fs_create_link,
diff --git a/include/ext4fs.h b/include/ext4fs.h
index cb5d9cc0a5c0..0f4cf32dcc2a 100644
--- a/include/ext4fs.h
+++ b/include/ext4fs.h
@@ -161,6 +161,7 @@ int ext4fs_probe(struct blk_desc *fs_dev_desc,
 struct disk_partition *fs_partition);
 int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len,
   loff_t *actread);
+int ext4_get_blocksize(const char *filename);
 int ext4_read_superblock(char *buffer);
 int ext4fs_uuid(char *uuid_str);
 void ext_cache_init(struct ext_block_cache *cache);
-- 
2.37.0



[PATCH v2 4/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs

2022-07-25 Thread Qu Wenruo
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code
to handle the unaligned read (aka, read range is not aligned to fs block
size).

This makes underlying fs code harder to implement, as  they have to handle
unaligned read all by themselves.

This patch will change the behavior, starting from btrfs, by moving the
unaligned read code into _fs_read().

The idea is pretty simple, if we have an unaligned read request, we
handle it in the following steps:

1. Grab the blocksize of the fs

2. Read the leading unaligned range
   We will read the block that @offset is in, and copy the
   requested part into buf.

   The the block we read covers the whole range, we just call it a day.

3. Read the remaining part
   The tailing part may be unaligned, but all fses handles the tailing
   part much easier than the leading unaligned part.

   As they just need to do a min(extent_size, start + len - cur) to
   calculate the real read size.

   In fact, for most file reading, the file size is not aligned and we
   need to handle the tailing part anyway.

There is a btrfs specific cleanup involved:

- In btrfs_file_read(), merge the tailing unaligned read into the main
  loop.
  Just reuse the existing read length calculation is enough.

- Remove read_and_truncate_page() call
  Since there is no explicit leading/tailing unaligned read anymore.

This has been tested with a proper randomly populated btrfs file, then
tried in sandbox mode with different aligned and unaligned range and
compare the output with md5sum.

Cc: Marek Behun 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c |  10 
 fs/btrfs/inode.c |  89 +++-
 fs/fs.c  | 130 ---
 include/btrfs.h  |   1 +
 4 files changed, 141 insertions(+), 89 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index bf9e1f2f17cf..7c8f4a3dfb87 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -234,6 +234,10 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
int ret;
 
ASSERT(fs_info);
+
+   /* Higher layer has ensures it never pass unaligned offset in. */
+   ASSERT(IS_ALIGNED(offset, fs_info->sectorsize));
+
ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID,
file, , , , 40);
if (ret < 0) {
@@ -275,6 +279,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
return 0;
 }
 
+int btrfs_get_blocksize(const char *filename)
+{
+   ASSERT(current_fs_info);
+   return current_fs_info->sectorsize;
+}
+
 void btrfs_close(void)
 {
if (current_fs_info) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0173d30cd8ab..aa198c5aaf1f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -617,44 +617,6 @@ check_next:
return 1;
 }
 
-static int read_and_truncate_page(struct btrfs_path *path,
- struct btrfs_file_extent_item *fi,
- int start, int len, char *dest)
-{
-   struct extent_buffer *leaf = path->nodes[0];
-   struct btrfs_fs_info *fs_info = leaf->fs_info;
-   u64 aligned_start = round_down(start, fs_info->sectorsize);
-   u8 extent_type;
-   char *buf;
-   int page_off = start - aligned_start;
-   int page_len = fs_info->sectorsize - page_off;
-   int ret;
-
-   ASSERT(start + len <= aligned_start + fs_info->sectorsize);
-   buf = malloc_cache_aligned(fs_info->sectorsize);
-   if (!buf)
-   return -ENOMEM;
-
-   extent_type = btrfs_file_extent_type(leaf, fi);
-   if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
-   ret = btrfs_read_extent_inline(path, fi, buf);
-   memcpy(dest, buf + page_off, min(page_len, ret));
-   free(buf);
-   return len;
-   }
-
-   ret = btrfs_read_extent_reg(path, fi,
-   round_down(start, fs_info->sectorsize),
-   fs_info->sectorsize, buf);
-   if (ret < 0) {
-   free(buf);
-   return ret;
-   }
-   memcpy(dest, buf + page_off, page_len);
-   free(buf);
-   return len;
-}
-
 int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
char *dest)
 {
@@ -663,7 +625,6 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
struct btrfs_path path;
struct btrfs_key key;
u64 aligned_start = round_down(file_offset, fs_info->sectorsize);
-   u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize);
u64 next_offset;
u64 cur = aligned_start;
int ret = 0;
@@ -673,34 +634,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
/* Set the whole dest all zero, so we won't nee

[PATCH v2 3/8] fs: btrfs: fix a crash if specified range is beyond file size

2022-07-25 Thread Qu Wenruo
[BUG]
When try to read a range beyond file size, btrfs driver will cause
crash/segfault:

 => load host 0 $kernel_addr_r 5k_file 0 0x2000
 SEGFAULT

[CAUSE]
In btrfs_read(), if @len is 0, we will truncated it to file end, but if
file end is beyond our file size, this truncation will underflow @len,
making it -3K in this case.

And later that @len is used to memzero the output buffer, resulting
above crash.

[FIX]
Just error out if @offset is already beyond our file size.

Now it will fail properly with correct error message:

 => load host 0 $kernel_addr_r 5m_origin 0 0x2000
 BTRFS: Read range beyond file size, offset 8192 file size 5120

 Failed to load '5m_origin'

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 9145727058d4..bf9e1f2f17cf 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -252,6 +252,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
return ret;
}
 
+   if (offset >= real_size) {
+   error("Read range beyond file size, offset %llu file size %llu",
+   offset, real_size);
+   return -EINVAL;
+   }
+
/*
 * If the length is 0 (meaning read the whole file) or the range is
 * beyond file size, truncate it to the end of the file.
-- 
2.37.0



[PATCH v2 2/8] fs: btrfs: fix a bug which no data get read if the length is not 0

2022-07-25 Thread Qu Wenruo
[BUG]
When testing with unaligned read, if a specific length is passed in,
btrfs driver will read out nothing:

 => load host 0 $kernel_addr_r 5k_file 0x1000 0
 0 bytes read in 0 ms

But if no length is passed in, it works fine, even if we pass a non-zero
length:

 => load host 0 $kernel_addr_r 5k_file 0 0x1000
 1024 bytes read in 0 ms

[CAUSE]
In btrfs_read() if we have a larger size than our file, we will try to
truncate it using the file size.

However the real file size is not initialized if @len is not zero, thus
we always truncate our length to 0, and cause the problem.

[FIX]
Fix it by just always do the file size check.

In fact btrfs_size() always follow soft link, thus it will return the
real file size correctly.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 741c6e20f533..9145727058d4 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -246,16 +246,17 @@ int btrfs_read(const char *file, void *buf, loff_t 
offset, loff_t len,
return -EINVAL;
}
 
-   if (!len) {
-   ret = btrfs_size(file, _size);
-   if (ret < 0) {
-   error("Failed to get inode size: %s", file);
-   return ret;
-   }
-   len = real_size;
+   ret = btrfs_size(file, _size);
+   if (ret < 0) {
+   error("Failed to get inode size: %s", file);
+   return ret;
}
 
-   if (len > real_size - offset)
+   /*
+* If the length is 0 (meaning read the whole file) or the range is
+* beyond file size, truncate it to the end of the file.
+*/
+   if (!len || len > real_size - offset)
len = real_size - offset;
 
ret = btrfs_file_read(root, ino, offset, len, buf);
-- 
2.37.0



[PATCH v2 1/8] fs: fat: unexport file_fat_read_at()

2022-07-25 Thread Qu Wenruo
That function is only utilized inside fat driver, unexport it.

Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 4 ++--
 include/fat.h | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index df9ea2c028fc..dcceccbcee0a 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1243,8 +1243,8 @@ out_free_itr:
return ret;
 }
 
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread)
+static int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
+   loff_t maxsize, loff_t *actread)
 {
fsdata fsdata;
fat_itr *itr;
diff --git a/include/fat.h b/include/fat.h
index bd8e450b33a3..a9756fb4cd1b 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect)
 int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
-- 
2.37.0



[PATCH v2 0/8] U-boot: fs: add generic unaligned read offset handling

2022-07-25 Thread Qu Wenruo
[CHANGELOG]
v2->v1:
- Fix a linkage error where (U64 % U32) is called without proper helper
  Fix it with U64 & (U32 - 1), as the U32 value (@blocksize) should
  always be power of 2, thus (@blocksize - 1) is the mask we want to
  calculate the offset inside the block.

  Above change only affects the 4th patch, everything else is not
  touched.

RFC->v1:
- More (manual) testing
  Unfortunately, in the latest master (75967970850a), the fs-tests.sh
  always seems to hang at preparing the fs image.

  Thus still has to do manual testing, tested btrfs, ext4 and fat, with
  aligned and unaligned read, also added soft link read, all looks fine here.

  Extra testing is still appreciated.

- Two more btrfs specific bug fixes
  All exposed during manual tests

- Remove the tailing unaligned block handling
  In fact, all fses can easily handle such case, just a min() call is
  enough.

- Remove the support for sandboxfs
  Since it's using read() calls, really no need to do block alignment
  check.

- Enhanced blocksize check
  Ensure the returned blocksize is not only non-error, but also
  non-zero.

This patchset can be fetched from github:
https://github.com/adam900710/u-boot/tree/fs_unaligned_read

[BACKGROUND]
Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code
just pass the request range to underlying fses.

Under most case, this works fine, as U-boot only really needs to read
the whole file (aka, 0 for both offset and len, len will be later
determined using file size).

But if some advanced user/script wants to extract kernel/initramfs from
combined image, we may need to do unaligned read in that case.

[ADVANTAGE]
This patchset will handle unaligned read range in _fs_read():

- Get blocksize of the underlying fs

- Read the leading block contianing the unaligned range
  The full block will be stored in a local buffer, then only copy
  the bytes in the unaligned range into the destination buffer.

  If the first block covers the whole range, we just call it aday.

- Read the remaining range which starts at block aligned offset
  For most common case, which is 0 offset and 0 len, the code will not
  be changed at all.

Just one extra get_blocksize() call, and for FAT/Btrfs/EROFS they all have
cached blocksize, thus it takes almost no extra cost.

Although for EXT4, it doesn't seem to cache the blocksize globally,
thus has to do a path resolve and grab the blocksize.

[DISADVANTAGE]
The involved problem is:

- Extra path resolving
  All those supported fs may have to do one extra path resolve if the
  read offset is not aligned.

  For EXT4, it will do one extra path resolve just to grab the
  blocksize.

For data read which starts at offset 0 (the most common case), it
should cause *NO* difference in performance.
As the extra handling is only for unaligned offset.

The common path is not really modified.

[SUPPORTED FSES]

- Btrfs (manually tested*)
- Ext4 (manually tested)
- FAT (manually tested)
- Erofs
- ubifs (unable to test, due to compile failure)

*: Failed to get the test cases run, thus have to go sandbox mode, and
attach an image with target fs, load the target file (with unaligned
range) and compare the result using md5sum.

For EXT4/FAT, they may need extra cleanup, as their existing unaligned
range handling is no longer needed anymore, cleaning them up should free 
more code lines than the added one.

Just not confident enough to modify them all by myself.

[UNSUPPORTED FSES]
- Squashfs
  They don't support non-zero offset, thus it can not handle the block
  aligned range.
  Need extra help to add block aligned offset support.

- Semihostfs
- Sandboxfs
  They all use read() directly, no need to do alignment check at all.

Extra testing/feedback is always appreciated.

Qu Wenruo (8):
  fs: fat: unexport file_fat_read_at()
  fs: btrfs: fix a bug which no data get read if the length is not 0
  fs: btrfs: fix a crash if specified range is beyond file size
  fs: btrfs: move the unaligned read code to _fs_read() for btrfs
  fs: ext4: rely on _fs_read() to handle leading unaligned block read
  fs: fat: rely on higher layer to get block aligned read range
  fs: ubifs: rely on higher layer to do unaligned read
  fs: erofs: add unaligned read range handling

 fs/btrfs/btrfs.c  |  33 ---
 fs/btrfs/inode.c  |  89 +++--
 fs/erofs/internal.h   |   1 +
 fs/erofs/super.c  |   6 ++
 fs/ext4/ext4fs.c  |  22 +++
 fs/fat/fat.c  |  17 +-
 fs/fs.c   | 130 +++---
 fs/ubifs/ubifs.c  |  13 +++--
 include/btrfs.h   |   1 +
 include/erofs.h   |   1 +
 include/ext4fs.h  |   1 +
 include/fat.h |   3 +-
 include/ubifs_uboot.h |   1 +
 13 files changed, 212 insertions(+), 106 deletions(-)

-- 
2.37.0



Re: [PATCH 1/8] fs: fat: unexport file_fat_read_at()

2022-07-25 Thread Qu Wenruo




On 2022/7/26 06:28, Tom Rini wrote:

On Wed, Jun 29, 2022 at 07:38:22PM +0800, Qu Wenruo wrote:


That function is only utilized inside fat driver, unexport it.

Signed-off-by: Qu Wenruo 


The series has a fails to build on nokia_rx51:
https://source.denx.de/u-boot/u-boot/-/jobs/471877#L483
which to me says doing 64bit division (likely related to block size,
etc) without using the appropriate helper macros to turn them in to bit
shifts instead.


Should I update and resend the series or just send the incremental
update to fix the U64/U32 division?

Thanks,
Qu


Re: [PATCH RFC 6/8] fs: sandboxfs: add sandbox_fs_get_blocksize()

2022-06-30 Thread Qu Wenruo




On 2022/6/30 18:06, Simon Glass wrote:

On Tue, 28 Jun 2022 at 01:28, Qu Wenruo  wrote:


This is to make sandboxfs to report blocksize it supports for
_fs_read() to handle unaligned read.

Unlike all other fses, sandboxfs can handle unaligned read/write without
any problem since it's calling read()/write(), which doesn't bother the
blocksize at all.

This change is mostly to make testing of _fs_read() much easier.

Cc: Simon Glass 
Signed-off-by: Qu Wenruo 
---
  arch/sandbox/cpu/os.c  | 11 +++
  fs/fs.c|  2 +-
  fs/sandbox/sandboxfs.c | 14 ++
  include/os.h   |  8 
  include/sandboxfs.h|  1 +
  5 files changed, 35 insertions(+), 1 deletion(-)


Reviewed-by: Simon Glass 

with a comment as requested below



diff --git a/arch/sandbox/cpu/os.c b/arch/sandbox/cpu/os.c
index 5ea54179176c..6c29f29bdd9b 100644
--- a/arch/sandbox/cpu/os.c
+++ b/arch/sandbox/cpu/os.c
@@ -46,6 +46,17 @@ ssize_t os_read(int fd, void *buf, size_t count)
 return read(fd, buf, count);
  }

+ssize_t os_get_blocksize(int fd)
+{
+   struct stat stat = {0};
+   int ret;
+
+   ret = fstat(fd, );
+   if (ret < 0)
+   return -errno;
+   return stat.st_blksize;
+}
+
  ssize_t os_write(int fd, const void *buf, size_t count)
  {
 return write(fd, buf, count);
diff --git a/fs/fs.c b/fs/fs.c
index 7e4ead9b790b..337d5711c28c 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -261,7 +261,7 @@ static struct fstype_info fstypes[] = {
 .exists = sandbox_fs_exists,
 .size = sandbox_fs_size,
 .read = fs_read_sandbox,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = sandbox_fs_get_blocksize,
 .write = fs_write_sandbox,
 .uuid = fs_uuid_unsupported,
 .opendir = fs_opendir_unsupported,
diff --git a/fs/sandbox/sandboxfs.c b/fs/sandbox/sandboxfs.c
index 4ae41d5b4db1..130fee088621 100644
--- a/fs/sandbox/sandboxfs.c
+++ b/fs/sandbox/sandboxfs.c
@@ -55,6 +55,20 @@ int sandbox_fs_read_at(const char *filename, loff_t pos, 
void *buffer,
 return ret;
  }

+int sandbox_fs_get_blocksize(const char *filename)
+{
+   int fd;
+   int ret;
+
+   fd = os_open(filename, OS_O_RDONLY);
+   if (fd < 0)
+   return fd;
+
+   ret = os_get_blocksize(fd);
+   os_close(fd);
+   return ret;
+}
+
  int sandbox_fs_write_at(const char *filename, loff_t pos, void *buffer,
 loff_t towrite, loff_t *actwrite)
  {
diff --git a/include/os.h b/include/os.h
index 10e198cf503e..a864d9ca39b2 100644
--- a/include/os.h
+++ b/include/os.h
@@ -26,6 +26,14 @@ struct sandbox_state;
   */
  ssize_t os_read(int fd, void *buf, size_t count);

+/**
+ * Get the optimial blocksize through stat() call.
+ *
+ * @fd:File descriptor as returned by os_open()
+ * Return: >=0 for the blocksize. <0 for error.
+ */
+ssize_t os_get_blocksize(int fd);
+
  /**
   * Access to the OS write() system call
   *
diff --git a/include/sandboxfs.h b/include/sandboxfs.h
index 783dd5c88a73..6937068f7b82 100644
--- a/include/sandboxfs.h
+++ b/include/sandboxfs.h
@@ -32,6 +32,7 @@ void sandbox_fs_close(void);
  int sandbox_fs_ls(const char *dirname);
  int sandbox_fs_exists(const char *filename);
  int sandbox_fs_size(const char *filename, loff_t *size);
+int sandbox_fs_get_blocksize(const char *filename);


Please add a full comment.


This is already removed in the formal version:
https://patchwork.kernel.org/project/linux-btrfs/list/?series=654990

As sandbox is just calling host OS read() to handle IO, thus it doesn't 
need to bother the alignment at all.


Thanks,
Qu




  int fs_read_sandbox(const char *filename, void *buf, loff_t offset, loff_t 
len,
 loff_t *actread);
  int fs_write_sandbox(const char *filename, void *buf, loff_t offset,
--
2.36.1



Regards,
Simon


[PATCH 7/8] fs: ubifs: rely on higher layer to do unaligned read

2022-06-29 Thread Qu Wenruo
Currently ubifs doesn't support unaligned read offset, thanks to the
recent _fs_read() work to handle unaligned read, we only need to
implement ubifs_get_blocksize() to take advantage of it.

Now ubifs can do unaligned read without any problem.

Signed-off-by: Qu Wenruo 
---
Unfortunately I can not test ubifs, as enabling UBI would cause compile
failure due to missing of  header.
---
 fs/fs.c   |  2 +-
 fs/ubifs/ubifs.c  | 13 -
 include/ubifs_uboot.h |  1 +
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/fs.c b/fs/fs.c
index 09625f52e913..61bae1051406 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -311,7 +311,7 @@ static struct fstype_info fstypes[] = {
.exists = ubifs_exists,
.size = ubifs_size,
.read = ubifs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ubifs_get_blocksize,
.write = fs_write_unsupported,
.uuid = fs_uuid_unsupported,
.opendir = fs_opendir_unsupported,
diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c
index d3026e310168..a8ab556dd376 100644
--- a/fs/ubifs/ubifs.c
+++ b/fs/ubifs/ubifs.c
@@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t 
offset,
 
*actread = 0;
 
-   if (offset & (PAGE_SIZE - 1)) {
-   printf("ubifs: Error offset must be a multiple of %d\n",
-  PAGE_SIZE);
-   return -1;
-   }
+   /* Higher layer should ensure it always pass page aligned range. */
+   assert(IS_ALIGNED(offset, PAGE_SIZE));
+   assert(IS_ALIGNED(size, PAGE_SIZE));
 
c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY);
/* ubifs_findfile will resolve symlinks, so we know that we get
@@ -920,6 +918,11 @@ out:
return err;
 }
 
+int ubifs_get_blocksize(const char *filename)
+{
+   return PAGE_SIZE;
+}
+
 void ubifs_close(void)
 {
 }
diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h
index b025779d59ff..bcd21715314a 100644
--- a/include/ubifs_uboot.h
+++ b/include/ubifs_uboot.h
@@ -29,6 +29,7 @@ int ubifs_exists(const char *filename);
 int ubifs_size(const char *filename, loff_t *size);
 int ubifs_read(const char *filename, void *buf, loff_t offset,
   loff_t size, loff_t *actread);
+int ubifs_get_blocksize(const char *filename);
 void ubifs_close(void);
 
 #endif /* __UBIFS_UBOOT_H__ */
-- 
2.36.1



[PATCH 8/8] fs: erofs: add unaligned read range handling

2022-06-29 Thread Qu Wenruo
I'm not an expert on erofs, but my quick glance didn't expose any
special handling on unaligned range, thus I think the U-boot erofs
driver doesn't really support unaligned read range.

This patch will add erofs_get_blocksize() so erofs can benefit from the
generic unaligned read support.

Cc: Huang Jianan 
Cc: linux-er...@lists.ozlabs.org
Signed-off-by: Qu Wenruo 
---
 fs/erofs/internal.h | 1 +
 fs/erofs/super.c| 6 ++
 fs/fs.c | 2 +-
 include/erofs.h | 1 +
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4af7c91560cc..d368a6481bf1 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -83,6 +83,7 @@ struct erofs_sb_info {
u16 available_compr_algs;
u16 lz4_max_distance;
u32 checksum;
+   u32 blocksize;
u16 extra_devices;
union {
u16 devt_slotoff;   /* used for mkfs */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 4cca322b9ead..df01d2e719a7 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -99,7 +99,13 @@ int erofs_read_superblock(void)
 
sbi.build_time = le64_to_cpu(dsb->build_time);
sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec);
+   sbi.blocksize = 1 << blkszbits;
 
memcpy(, dsb->uuid, sizeof(dsb->uuid));
return erofs_init_devices(, dsb);
 }
+
+int erofs_get_blocksize(const char *filename)
+{
+   return sbi.blocksize;
+}
diff --git a/fs/fs.c b/fs/fs.c
index 61bae1051406..e92174d89c28 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -375,7 +375,7 @@ static struct fstype_info fstypes[] = {
.readdir = erofs_readdir,
.ls = fs_ls_generic,
.read = erofs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = erofs_get_blocksize,
.size = erofs_size,
.close = erofs_close,
.closedir = erofs_closedir,
diff --git a/include/erofs.h b/include/erofs.h
index 1fbe82bf72cb..18bd6807c538 100644
--- a/include/erofs.h
+++ b/include/erofs.h
@@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc,
struct disk_partition *fs_partition);
 int erofs_read(const char *filename, void *buf, loff_t offset,
   loff_t len, loff_t *actread);
+int erofs_get_blocksize(const char *filename);
 int erofs_size(const char *filename, loff_t *size);
 int erofs_exists(const char *filename);
 void erofs_close(void);
-- 
2.36.1



[PATCH 5/8] fs: ext4: rely on _fs_read() to handle leading unaligned block read

2022-06-29 Thread Qu Wenruo
Just add ext4_get_blocksize() and a new assert() in ext4fs_read_file().

Signed-off-by: Qu Wenruo 
---
Several cleanup candicate:

1. ext_fs->blksz is never populated, thus it's always 0
   We can not easily grab blocksize just like btrfs in this case.

   Thus we have to go the same calculation in ext4fs_read_file().

2. can not easily cleanup @skipfirst in ext4fs_read_file()
   It seems to screw up with soft link read.
---
 fs/ext4/ext4fs.c | 22 ++
 fs/fs.c  |  2 +-
 include/ext4fs.h |  1 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c
index 4c89152ce4ad..b0568e77a895 100644
--- a/fs/ext4/ext4fs.c
+++ b/fs/ext4/ext4fs.c
@@ -69,6 +69,8 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos,
short status;
struct ext_block_cache cache;
 
+   assert(IS_ALIGNED(pos, blocksize));
+
ext_cache_init();
 
/* Adjust len so it we can't read past the end of the file. */
@@ -259,6 +261,26 @@ int ext4_read_file(const char *filename, void *buf, loff_t 
offset, loff_t len,
return ext4fs_read(buf, offset, len, len_read);
 }
 
+int ext4_get_blocksize(const char *filename)
+{
+   struct ext_filesystem *fs;
+   int log2blksz;
+   int log2_fs_blocksize;
+   loff_t file_len;
+   int ret;
+
+   ret = ext4fs_open(filename, _len);
+   if (ret < 0) {
+   printf("** File not found %s **\n", filename);
+   return -1;
+   }
+   fs = get_fs();
+   log2blksz = fs->dev_desc->log2blksz;
+   log2_fs_blocksize = LOG2_BLOCK_SIZE(ext4fs_file->data) - log2blksz;
+
+   return (1 << (log2_fs_blocksize + log2blksz));
+}
+
 int ext4fs_uuid(char *uuid_str)
 {
if (ext4fs_root == NULL)
diff --git a/fs/fs.c b/fs/fs.c
index f9df8d9ec9a4..3cd2c88a4895 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -236,7 +236,7 @@ static struct fstype_info fstypes[] = {
.exists = ext4fs_exists,
.size = ext4fs_size,
.read = ext4_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ext4_get_blocksize,
 #ifdef CONFIG_CMD_EXT4_WRITE
.write = ext4_write_file,
.ln = ext4fs_create_link,
diff --git a/include/ext4fs.h b/include/ext4fs.h
index cb5d9cc0a5c0..0f4cf32dcc2a 100644
--- a/include/ext4fs.h
+++ b/include/ext4fs.h
@@ -161,6 +161,7 @@ int ext4fs_probe(struct blk_desc *fs_dev_desc,
 struct disk_partition *fs_partition);
 int ext4_read_file(const char *filename, void *buf, loff_t offset, loff_t len,
   loff_t *actread);
+int ext4_get_blocksize(const char *filename);
 int ext4_read_superblock(char *buffer);
 int ext4fs_uuid(char *uuid_str);
 void ext_cache_init(struct ext_block_cache *cache);
-- 
2.36.1



[PATCH 4/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs

2022-06-29 Thread Qu Wenruo
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code
to handle the unaligned read (aka, read range is not aligned to fs block
size).

This makes underlying fs code harder to implement, as  they have to handle
unaligned read all by themselves.

This patch will change the behavior, starting from btrfs, by moving the
unaligned read code into _fs_read().

The idea is pretty simple, if we have an unaligned read request, we
handle it in the following steps:

1. Grab the blocksize of the fs

2. Read the leading unaligned range
   We will read the block that @offset is in, and copy the
   requested part into buf.

   The the block we read covers the whole range, we just call it a day.

3. Read the remaining part
   The tailing part may be unaligned, but all fses handles the tailing
   part much easier than the leading unaligned part.

   As they just need to do a min(extent_size, start + len - cur) to
   calculate the real read size.

   In fact, for most file reading, the file size is not aligned and we
   need to handle the tailing part anyway.

There is a btrfs specific cleanup involved:

- In btrfs_file_read(), merge the tailing unaligned read into the main
  loop.
  Just reuse the existing read length calculation is enough.

- Remove read_and_truncate_page() call
  Since there is no explicit leading/tailing unaligned read anymore.

This has been tested with a proper randomly populated btrfs file, then
tried in sandbox mode with different aligned and unaligned range and
compare the output with md5sum.

Cc: Marek Behun 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c |  10 
 fs/btrfs/inode.c |  89 +++-
 fs/fs.c  | 130 ---
 include/btrfs.h  |   1 +
 4 files changed, 141 insertions(+), 89 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index bf9e1f2f17cf..7c8f4a3dfb87 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -234,6 +234,10 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
int ret;
 
ASSERT(fs_info);
+
+   /* Higher layer has ensures it never pass unaligned offset in. */
+   ASSERT(IS_ALIGNED(offset, fs_info->sectorsize));
+
ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID,
file, , , , 40);
if (ret < 0) {
@@ -275,6 +279,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
return 0;
 }
 
+int btrfs_get_blocksize(const char *filename)
+{
+   ASSERT(current_fs_info);
+   return current_fs_info->sectorsize;
+}
+
 void btrfs_close(void)
 {
if (current_fs_info) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d00b5153336d..bef0a972f40d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -620,44 +620,6 @@ check_next:
return 1;
 }
 
-static int read_and_truncate_page(struct btrfs_path *path,
- struct btrfs_file_extent_item *fi,
- int start, int len, char *dest)
-{
-   struct extent_buffer *leaf = path->nodes[0];
-   struct btrfs_fs_info *fs_info = leaf->fs_info;
-   u64 aligned_start = round_down(start, fs_info->sectorsize);
-   u8 extent_type;
-   char *buf;
-   int page_off = start - aligned_start;
-   int page_len = fs_info->sectorsize - page_off;
-   int ret;
-
-   ASSERT(start + len <= aligned_start + fs_info->sectorsize);
-   buf = malloc_cache_aligned(fs_info->sectorsize);
-   if (!buf)
-   return -ENOMEM;
-
-   extent_type = btrfs_file_extent_type(leaf, fi);
-   if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
-   ret = btrfs_read_extent_inline(path, fi, buf);
-   memcpy(dest, buf + page_off, min(page_len, ret));
-   free(buf);
-   return len;
-   }
-
-   ret = btrfs_read_extent_reg(path, fi,
-   round_down(start, fs_info->sectorsize),
-   fs_info->sectorsize, buf);
-   if (ret < 0) {
-   free(buf);
-   return ret;
-   }
-   memcpy(dest, buf + page_off, page_len);
-   free(buf);
-   return len;
-}
-
 int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
char *dest)
 {
@@ -666,7 +628,6 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
struct btrfs_path path;
struct btrfs_key key;
u64 aligned_start = round_down(file_offset, fs_info->sectorsize);
-   u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize);
u64 next_offset;
u64 cur = aligned_start;
int ret = 0;
@@ -676,34 +637,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
/* Set the whole dest all zero, so we won't nee

[PATCH 6/8] fs: fat: rely on higher layer to get block aligned read range

2022-06-29 Thread Qu Wenruo
Just implement fat_get_blocksize() for fat, so that fat_read_file()
always get a block aligned read range.

Unfortunately I'm not experienced enough to cleanup the fat code, thus
further cleanup is appreciated.

Cc: Tom Rini 
Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 13 +
 fs/fs.c   |  2 +-
 include/fat.h |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index dcceccbcee0a..e13035e8e6d1 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, 
loff_t offset, loff_t len,
return ret;
 }
 
+int fat_get_blocksize(const char *filename)
+{
+   fsdata fsdata = {0};
+   int ret;
+
+   ret = get_fs_info();
+   if (ret)
+   return ret;
+
+   free(fsdata.fatbuf);
+   return fsdata.sect_size;
+}
+
 typedef struct {
struct fs_dir_stream parent;
struct fs_dirent dirent;
diff --git a/fs/fs.c b/fs/fs.c
index 3cd2c88a4895..09625f52e913 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -207,7 +207,7 @@ static struct fstype_info fstypes[] = {
.exists = fat_exists,
.size = fat_size,
.read = fat_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = fat_get_blocksize,
 #if CONFIG_IS_ENABLED(FAT_WRITE)
.write = file_fat_write,
.unlink = fat_unlink,
diff --git a/include/fat.h b/include/fat.h
index a9756fb4cd1b..c03a2bebecef 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -201,6 +201,7 @@ int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
+int fat_get_blocksize(const char *filename);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
 
-- 
2.36.1



[PATCH 3/8] fs: btrfs: fix a crash if specified range is beyond file size

2022-06-29 Thread Qu Wenruo
[BUG]
When try to read a range beyond file size, btrfs driver will cause
crash/segfault:

 => load host 0 $kernel_addr_r 5k_file 0 0x2000
 SEGFAULT

[CAUSE]
In btrfs_read(), if @len is 0, we will truncated it to file end, but if
file end is beyond our file size, this truncation will underflow @len,
making it -3K in this case.

And later that @len is used to memzero the output buffer, resulting
above crash.

[FIX]
Just error out if @offset is already beyond our file size.

Now it will fail properly with correct error message:

 => load host 0 $kernel_addr_r 5m_origin 0 0x2000
 BTRFS: Read range beyond file size, offset 8192 file size 5120

 Failed to load '5m_origin'

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 9145727058d4..bf9e1f2f17cf 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -252,6 +252,12 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
return ret;
}
 
+   if (offset >= real_size) {
+   error("Read range beyond file size, offset %llu file size %llu",
+   offset, real_size);
+   return -EINVAL;
+   }
+
/*
 * If the length is 0 (meaning read the whole file) or the range is
 * beyond file size, truncate it to the end of the file.
-- 
2.36.1



[PATCH 2/8] fs: btrfs: fix a bug which no data get read if the length is not 0

2022-06-29 Thread Qu Wenruo
[BUG]
When testing with unaligned read, if a specific length is passed in,
btrfs driver will read out nothing:

 => load host 0 $kernel_addr_r 5k_file 0x1000 0
 0 bytes read in 0 ms

But if no length is passed in, it works fine, even if we pass a non-zero
length:

 => load host 0 $kernel_addr_r 5k_file 0 0x1000
 1024 bytes read in 0 ms

[CAUSE]
In btrfs_read() if we have a larger size than our file, we will try to
truncate it using the file size.

However the real file size is not initialized if @len is not zero, thus
we always truncate our length to 0, and cause the problem.

[FIX]
Fix it by just always do the file size check.

In fact btrfs_size() always follow soft link, thus it will return the
real file size correctly.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 741c6e20f533..9145727058d4 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -246,16 +246,17 @@ int btrfs_read(const char *file, void *buf, loff_t 
offset, loff_t len,
return -EINVAL;
}
 
-   if (!len) {
-   ret = btrfs_size(file, _size);
-   if (ret < 0) {
-   error("Failed to get inode size: %s", file);
-   return ret;
-   }
-   len = real_size;
+   ret = btrfs_size(file, _size);
+   if (ret < 0) {
+   error("Failed to get inode size: %s", file);
+   return ret;
}
 
-   if (len > real_size - offset)
+   /*
+* If the length is 0 (meaning read the whole file) or the range is
+* beyond file size, truncate it to the end of the file.
+*/
+   if (!len || len > real_size - offset)
len = real_size - offset;
 
ret = btrfs_file_read(root, ino, offset, len, buf);
-- 
2.36.1



[PATCH 1/8] fs: fat: unexport file_fat_read_at()

2022-06-29 Thread Qu Wenruo
That function is only utilized inside fat driver, unexport it.

Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 4 ++--
 include/fat.h | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index df9ea2c028fc..dcceccbcee0a 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1243,8 +1243,8 @@ out_free_itr:
return ret;
 }
 
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread)
+static int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
+   loff_t maxsize, loff_t *actread)
 {
fsdata fsdata;
fat_itr *itr;
diff --git a/include/fat.h b/include/fat.h
index bd8e450b33a3..a9756fb4cd1b 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect)
 int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
-- 
2.36.1



[PATCH 0/8] U-boot: fs: add generic unaligned read offset handling

2022-06-29 Thread Qu Wenruo
[CHANGELOG]
RFC->v1:
- More (manual) testing
  Unfortunately, in the latest master (75967970850a), the fs-tests.sh
  always seems to hang at preparing the fs image.

  Thus still has to do manual testing, tested btrfs, ext4 and fat, with
  aligned and unaligned read, also added soft link read, all looks fine here.

  Extra testing is still appreciated.

- Two more btrfs specific bug fixes
  All exposed during manual tests

- Remove the tailing unaligned block handling
  In fact, all fses can easily handle such case, just a min() call is
  enough.

- Remove the support for sandboxfs
  Since it's using read() calls, really no need to do block alignment
  check.

- Enhanced blocksize check
  Ensure the returned blocksize is not only non-error, but also
  non-zero.

This patchset can be fetched from github:
https://github.com/adam900710/u-boot/tree/fs_unaligned_read

[BACKGROUND]
Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code
just pass the request range to underlying fses.

Under most case, this works fine, as U-boot only really needs to read
the whole file (aka, 0 for both offset and len, len will be later
determined using file size).

But if some advanced user/script wants to extract kernel/initramfs from
combined image, we may need to do unaligned read in that case.

[ADVANTAGE]
This patchset will handle unaligned read range in _fs_read():

- Get blocksize of the underlying fs

- Read the leading block contianing the unaligned range
  The full block will be stored in a local buffer, then only copy
  the bytes in the unaligned range into the destination buffer.

  If the first block covers the whole range, we just call it aday.

- Read the remaining range which starts at block aligned offset
  For most common case, which is 0 offset and 0 len, the code will not
  be changed at all.

Just one extra get_blocksize() call, and for FAT/Btrfs/EROFS they all have
cached blocksize, thus it takes almost no extra cost.

Although for EXT4, it doesn't seem to cache the blocksize globally,
thus has to do a path resolve and grab the blocksize.

[DISADVANTAGE]
The involved problem is:

- Extra path resolving
  All those supported fs may have to do one extra path resolve if the
  read offset is not aligned.

  For EXT4, it will do one extra path resolve just to grab the
  blocksize.

For data read which starts at offset 0 (the most common case), it
should cause *NO* difference in performance.
As the extra handling is only for unaligned offset.

The common path is not really modified.

[SUPPORTED FSES]

- Btrfs (manually tested*)
- Ext4 (manually tested)
- FAT (manually tested)
- Erofs
- ubifs (unable to test, due to compile failure)

*: Failed to get the test cases run, thus have to go sandbox mode, and
attach an image with target fs, load the target file (with unaligned
range) and compare the result using md5sum.

For EXT4/FAT, they may need extra cleanup, as their existing unaligned
range handling is no longer needed anymore, cleaning them up should free 
more code lines than the added one.

Just not confident enough to modify them all by myself.

[UNSUPPORTED FSES]
- Squashfs
  They don't support non-zero offset, thus it can not handle the block
  aligned range.
  Need extra help to add block aligned offset support.

- Semihostfs
- Sandboxfs
  They all use read() directly, no need to do alignment check at all.

Extra testing/feedback is always appreciated.

Qu Wenruo (8):
  fs: fat: unexport file_fat_read_at()
  fs: btrfs: fix a bug which no data get read if the length is not 0
  fs: btrfs: fix a crash if specified range is beyond file size
  fs: btrfs: move the unaligned read code to _fs_read() for btrfs
  fs: ext4: rely on _fs_read() to handle leading unaligned block read
  fs: fat: rely on higher layer to get block aligned read range
  fs: ubifs: rely on higher layer to do unaligned read
  fs: erofs: add unaligned read range handling

 fs/btrfs/btrfs.c  |  33 ---
 fs/btrfs/inode.c  |  89 +++--
 fs/erofs/internal.h   |   1 +
 fs/erofs/super.c  |   6 ++
 fs/ext4/ext4fs.c  |  22 +++
 fs/fat/fat.c  |  17 +-
 fs/fs.c   | 130 +++---
 fs/ubifs/ubifs.c  |  13 +++--
 include/btrfs.h   |   1 +
 include/erofs.h   |   1 +
 include/ext4fs.h  |   1 +
 include/fat.h |   3 +-
 include/ubifs_uboot.h |   1 +
 13 files changed, 212 insertions(+), 106 deletions(-)

-- 
2.36.1



Re: [PATCH 0/8] u-boot: fs: add generic unaligned read handling

2022-06-28 Thread Qu Wenruo




On 2022/6/28 22:17, Tom Rini wrote:

On Tue, Jun 28, 2022 at 03:28:00PM +0800, Qu Wenruo wrote:

[BACKGROUND]
Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code
just pass the request range to underlying fses.

Under most case, this works fine, as U-boot only really needs to read
the whole file (aka, 0 for both offset and len, len will be later
determined using file size).

But if some advanced user/script wants to extract kernel/initramfs from
combined image, we may need to do unaligned read in that case.

[ADVANTAGE]
This patchset will handle unaligned read range in _fs_read():

- Get blocksize of the underlying fs

- Read the leading block contianing the unaligned range
   The full block will be stored in a local buffer, then only copy
   the bytes in the unaligned range into the destination buffer.

   If the first block covers the whole range, we just call it aday.

- Read the aligned range if there is any

- Read the tailing block containing the unaligned range
   And copy the covered range into the destination.

[DISADVANTAGE]
There are mainly two problems:

- Extra memory allocation for every _fs_read() call
   For the leading and tailing block.

- Extra path resolving
   All those supported fs will have to do extra path resolving up to 2
   times (one for the leading block, one for the tailing block).
   This may slow down the read.


This conceptually seems like a good thing.  Can you please post some
before/after times of reading large images from the supported
filesystems?



One thing to mention is, this change doesn't really bother large file read.

As the patchset is splitting a large read into 3 parts:

1) Leading block
2) Aligned blocks, aka the main part of a large file
3) Tailing block

Most time should still be spent on part 2), not much different than the
old code. Part 1) and Part 3) are at most 2 blocks (aka, 2 * 4KiB for
most modern large enough fses).

So I doubt it would make any difference for large file read.


Furthermore, as pointed out by Huang Jianan, currently the patchset can
not handle read on soft link correctly, thus I'd update the series to do
the split into even less parts:

1) Leading block
   For the unaligned initial block

2) Aligned blocks until the end
   The tailing block should still starts at a block aligned position,
   thus most filesystems is already handling them correctly.
   (Just a min(end, blockend) is enough for most cases already).

Anyway, I'll try to craft some benchmarking for file reads using sandbox.
But please don't expect much (or any) difference in that case.

Thanks,
Qu


Re: [PATCH RFC 2/8] fs: always get the file size in _fs_read()

2022-06-28 Thread Qu Wenruo




On 2022/6/28 20:36, Huang Jianan wrote:

Hi, wenruo,

在 2022/6/28 15:28, Qu Wenruo 写道:

For _fs_read(), @len == 0 means we read the whole file.
And we just pass @len == 0 to make each filesystem to handle it.

In fact we have info->size() call to properly get the filesize.

So we can not only call info->size() to grab the file_size for len == 0
case, but also detect invalid @len (e.g. @len > file_size) in advance or
truncate @len.

This behavior also allows us to handle unaligned better in the incoming
patches.

Signed-off-by: Qu Wenruo 
---
  fs/fs.c | 25 +
  1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/fs.c b/fs/fs.c
index 6de1a3eb6d5d..d992cdd6d650 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -579,6 +579,7 @@ static int _fs_read(const char *filename, ulong
addr, loff_t offset, loff_t len,
  {
  struct fstype_info *info = fs_get_info(fs_type);
  void *buf;
+    loff_t file_size;
  int ret;
  #ifdef CONFIG_LMB
@@ -589,10 +590,26 @@ static int _fs_read(const char *filename, ulong
addr, loff_t offset, loff_t len,
  }
  #endif
-    /*
- * We don't actually know how many bytes are being read, since
len==0
- * means read the whole file.
- */
+    ret = info->size(filename, _size);


I get an error when running the erofs test cases. The return value isn't
as expected
when reading symlink file.
For symlink file, erofs_size will return the size of the symlink itself
here.


Indeed, this is a problem, in fact I also checked other fses, it looks
like we all just return the inode size for the softlink, thus size()
call can not be relied on for those cases.

While for the read() calls, every fs will do extra resolving for soft
links, thus it doesn't cause problems.


This can still be solved by not calling size() calls at all, and only do
the unaligned read handling for the leading block.

Thank you very much for pointing this bug out, would update the patchset
for that.

Thanks,
Qu

+    if (ret < 0) {
+    log_err("** Unable to get file size for %s, %d **\n",
+    filename, ret);
+    return ret;
+    }
+    if (offset >= file_size) {
+    log_err(
+    "** Invalid offset, offset (%llu) >= file size (%llu)\n",
+    offset, file_size);
+    return -EINVAL;
+
+    }
+    if (len == 0 || offset + len > file_size) {
+    if (len > file_size)
+    log_info(
+    "** Truncate read length from %llu to %llu, as file size is %llu
**\n",
+ len, file_size, file_size);
+    len = file_size - offset;

Then, we will get a wrong len in the case of len==0. So I think we need
to do something
extra with the symlink file?

Thanks,
Jianan

+    }
  buf = map_sysmem(addr, len);
  ret = info->read(filename, buf, offset, len, actread);
  unmap_sysmem(buf);




[PATCH RFC 8/8] fs: erofs: add unaligned read range handling

2022-06-28 Thread Qu Wenruo
I'm not an expert on erofs, but my quick glance didn't expose any
special handling on unaligned range, thus I think the U-boot erofs
driver doesn't really support unaligned read range.

This patch will add erofs_get_blocksize() so erofs can benefit from the
generic unaligned read support.

Cc: Huang Jianan 
Cc: linux-er...@lists.ozlabs.org
Signed-off-by: Qu Wenruo 
---
 fs/erofs/internal.h | 1 +
 fs/erofs/super.c| 6 ++
 fs/fs.c | 2 +-
 include/erofs.h | 1 +
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4af7c91560cc..d368a6481bf1 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -83,6 +83,7 @@ struct erofs_sb_info {
u16 available_compr_algs;
u16 lz4_max_distance;
u32 checksum;
+   u32 blocksize;
u16 extra_devices;
union {
u16 devt_slotoff;   /* used for mkfs */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 4cca322b9ead..df01d2e719a7 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -99,7 +99,13 @@ int erofs_read_superblock(void)
 
sbi.build_time = le64_to_cpu(dsb->build_time);
sbi.build_time_nsec = le32_to_cpu(dsb->build_time_nsec);
+   sbi.blocksize = 1 << blkszbits;
 
memcpy(, dsb->uuid, sizeof(dsb->uuid));
return erofs_init_devices(, dsb);
 }
+
+int erofs_get_blocksize(const char *filename)
+{
+   return sbi.blocksize;
+}
diff --git a/fs/fs.c b/fs/fs.c
index 41943e3a95db..0ef5fbdda5d6 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -366,7 +366,7 @@ static struct fstype_info fstypes[] = {
.readdir = erofs_readdir,
.ls = fs_ls_generic,
.read = erofs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = erofs_get_blocksize,
.size = erofs_size,
.close = erofs_close,
.closedir = erofs_closedir,
diff --git a/include/erofs.h b/include/erofs.h
index 1fbe82bf72cb..18bd6807c538 100644
--- a/include/erofs.h
+++ b/include/erofs.h
@@ -10,6 +10,7 @@ int erofs_probe(struct blk_desc *fs_dev_desc,
struct disk_partition *fs_partition);
 int erofs_read(const char *filename, void *buf, loff_t offset,
   loff_t len, loff_t *actread);
+int erofs_get_blocksize(const char *filename);
 int erofs_size(const char *filename, loff_t *size);
 int erofs_exists(const char *filename);
 void erofs_close(void);
-- 
2.36.1



[PATCH RFC 7/8] fs: ubifs: rely on higher layer to do unaligned read

2022-06-28 Thread Qu Wenruo
Currently ubifs doesn't support unaligned read offset, thanks to the
recent _fs_read() work to handle unaligned read, we only need to
implement ubifs_get_blocksize() to take advantage of it.

Now ubifs can do unaligned read without any problem.

Signed-off-by: Qu Wenruo 
---
 fs/fs.c   |  2 +-
 fs/ubifs/ubifs.c  | 13 -
 include/ubifs_uboot.h |  1 +
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/fs.c b/fs/fs.c
index 337d5711c28c..41943e3a95db 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -302,7 +302,7 @@ static struct fstype_info fstypes[] = {
.exists = ubifs_exists,
.size = ubifs_size,
.read = ubifs_read,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ubifs_get_blocksize,
.write = fs_write_unsupported,
.uuid = fs_uuid_unsupported,
.opendir = fs_opendir_unsupported,
diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c
index d3026e310168..a8ab556dd376 100644
--- a/fs/ubifs/ubifs.c
+++ b/fs/ubifs/ubifs.c
@@ -846,11 +846,9 @@ int ubifs_read(const char *filename, void *buf, loff_t 
offset,
 
*actread = 0;
 
-   if (offset & (PAGE_SIZE - 1)) {
-   printf("ubifs: Error offset must be a multiple of %d\n",
-  PAGE_SIZE);
-   return -1;
-   }
+   /* Higher layer should ensure it always pass page aligned range. */
+   assert(IS_ALIGNED(offset, PAGE_SIZE));
+   assert(IS_ALIGNED(size, PAGE_SIZE));
 
c->ubi = ubi_open_volume(c->vi.ubi_num, c->vi.vol_id, UBI_READONLY);
/* ubifs_findfile will resolve symlinks, so we know that we get
@@ -920,6 +918,11 @@ out:
return err;
 }
 
+int ubifs_get_blocksize(const char *filename)
+{
+   return PAGE_SIZE;
+}
+
 void ubifs_close(void)
 {
 }
diff --git a/include/ubifs_uboot.h b/include/ubifs_uboot.h
index b025779d59ff..bcd21715314a 100644
--- a/include/ubifs_uboot.h
+++ b/include/ubifs_uboot.h
@@ -29,6 +29,7 @@ int ubifs_exists(const char *filename);
 int ubifs_size(const char *filename, loff_t *size);
 int ubifs_read(const char *filename, void *buf, loff_t offset,
   loff_t size, loff_t *actread);
+int ubifs_get_blocksize(const char *filename);
 void ubifs_close(void);
 
 #endif /* __UBIFS_UBOOT_H__ */
-- 
2.36.1



[PATCH RFC 6/8] fs: sandboxfs: add sandbox_fs_get_blocksize()

2022-06-28 Thread Qu Wenruo
This is to make sandboxfs to report blocksize it supports for
_fs_read() to handle unaligned read.

Unlike all other fses, sandboxfs can handle unaligned read/write without
any problem since it's calling read()/write(), which doesn't bother the
blocksize at all.

This change is mostly to make testing of _fs_read() much easier.

Cc: Simon Glass 
Signed-off-by: Qu Wenruo 
---
 arch/sandbox/cpu/os.c  | 11 +++
 fs/fs.c|  2 +-
 fs/sandbox/sandboxfs.c | 14 ++
 include/os.h   |  8 
 include/sandboxfs.h|  1 +
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/sandbox/cpu/os.c b/arch/sandbox/cpu/os.c
index 5ea54179176c..6c29f29bdd9b 100644
--- a/arch/sandbox/cpu/os.c
+++ b/arch/sandbox/cpu/os.c
@@ -46,6 +46,17 @@ ssize_t os_read(int fd, void *buf, size_t count)
return read(fd, buf, count);
 }
 
+ssize_t os_get_blocksize(int fd)
+{
+   struct stat stat = {0};
+   int ret;
+
+   ret = fstat(fd, );
+   if (ret < 0)
+   return -errno;
+   return stat.st_blksize;
+}
+
 ssize_t os_write(int fd, const void *buf, size_t count)
 {
return write(fd, buf, count);
diff --git a/fs/fs.c b/fs/fs.c
index 7e4ead9b790b..337d5711c28c 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -261,7 +261,7 @@ static struct fstype_info fstypes[] = {
.exists = sandbox_fs_exists,
.size = sandbox_fs_size,
.read = fs_read_sandbox,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = sandbox_fs_get_blocksize,
.write = fs_write_sandbox,
.uuid = fs_uuid_unsupported,
.opendir = fs_opendir_unsupported,
diff --git a/fs/sandbox/sandboxfs.c b/fs/sandbox/sandboxfs.c
index 4ae41d5b4db1..130fee088621 100644
--- a/fs/sandbox/sandboxfs.c
+++ b/fs/sandbox/sandboxfs.c
@@ -55,6 +55,20 @@ int sandbox_fs_read_at(const char *filename, loff_t pos, 
void *buffer,
return ret;
 }
 
+int sandbox_fs_get_blocksize(const char *filename)
+{
+   int fd;
+   int ret;
+
+   fd = os_open(filename, OS_O_RDONLY);
+   if (fd < 0)
+   return fd;
+
+   ret = os_get_blocksize(fd);
+   os_close(fd);
+   return ret;
+}
+
 int sandbox_fs_write_at(const char *filename, loff_t pos, void *buffer,
loff_t towrite, loff_t *actwrite)
 {
diff --git a/include/os.h b/include/os.h
index 10e198cf503e..a864d9ca39b2 100644
--- a/include/os.h
+++ b/include/os.h
@@ -26,6 +26,14 @@ struct sandbox_state;
  */
 ssize_t os_read(int fd, void *buf, size_t count);
 
+/**
+ * Get the optimial blocksize through stat() call.
+ *
+ * @fd:File descriptor as returned by os_open()
+ * Return: >=0 for the blocksize. <0 for error.
+ */
+ssize_t os_get_blocksize(int fd);
+
 /**
  * Access to the OS write() system call
  *
diff --git a/include/sandboxfs.h b/include/sandboxfs.h
index 783dd5c88a73..6937068f7b82 100644
--- a/include/sandboxfs.h
+++ b/include/sandboxfs.h
@@ -32,6 +32,7 @@ void sandbox_fs_close(void);
 int sandbox_fs_ls(const char *dirname);
 int sandbox_fs_exists(const char *filename);
 int sandbox_fs_size(const char *filename, loff_t *size);
+int sandbox_fs_get_blocksize(const char *filename);
 int fs_read_sandbox(const char *filename, void *buf, loff_t offset, loff_t len,
loff_t *actread);
 int fs_write_sandbox(const char *filename, void *buf, loff_t offset,
-- 
2.36.1



[PATCH RFC 5/8] fs: fat: rely on higher layer to get block aligned read range

2022-06-28 Thread Qu Wenruo
Just implement fat_get_blocksize() for fat, so that fat_read_file()
always get a block aligned read range.

Unfortunately I'm not experienced enough to cleanup the fat code, thus
further cleanup is appreciated.

Cc: Tom Rini 
Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 13 +
 fs/fs.c   |  2 +-
 include/fat.h |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index dcceccbcee0a..e13035e8e6d1 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1299,6 +1299,19 @@ int fat_read_file(const char *filename, void *buf, 
loff_t offset, loff_t len,
return ret;
 }
 
+int fat_get_blocksize(const char *filename)
+{
+   fsdata fsdata = {0};
+   int ret;
+
+   ret = get_fs_info();
+   if (ret)
+   return ret;
+
+   free(fsdata.fatbuf);
+   return fsdata.sect_size;
+}
+
 typedef struct {
struct fs_dir_stream parent;
struct fs_dirent dirent;
diff --git a/fs/fs.c b/fs/fs.c
index e69a0968bb6d..7e4ead9b790b 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -207,7 +207,7 @@ static struct fstype_info fstypes[] = {
.exists = fat_exists,
.size = fat_size,
.read = fat_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = fat_get_blocksize,
 #if CONFIG_IS_ENABLED(FAT_WRITE)
.write = file_fat_write,
.unlink = fat_unlink,
diff --git a/include/fat.h b/include/fat.h
index a9756fb4cd1b..c03a2bebecef 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -201,6 +201,7 @@ int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
+int fat_get_blocksize(const char *filename);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
 
-- 
2.36.1



[PATCH RFC 4/8] fs: ext4: rely on _fs_read() to pass block aligned range into ext4fs_read_file()

2022-06-28 Thread Qu Wenruo
Since _fs_read() is handling the unaligned read internally, ext4 driver
only need to handle block aligned read.

Unfortunately I'm not familiar with ext4 and its driver, thus not
confident enough to cleanup all the unaligned read related code.

So here we will have some dead code, and any help to clean them up is
appreciated.

Cc: Tom Rini 
Signed-off-by: Qu Wenruo 
---
 fs/ext4/ext4fs.c | 11 +++
 fs/fs.c  |  2 +-
 include/ext4fs.h |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4fs.c b/fs/ext4/ext4fs.c
index 4c89152ce4ad..be2680994d8b 100644
--- a/fs/ext4/ext4fs.c
+++ b/fs/ext4/ext4fs.c
@@ -71,6 +71,10 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos,
 
ext_cache_init();
 
+   /* Higher layer has ensured to pass block aligned range here. */
+   assert(IS_ALIGNED(pos, blocksize));
+   assert(IS_ALIGNED(len, blocksize));
+
/* Adjust len so it we can't read past the end of the file. */
if (len + pos > filesize)
len = (filesize - pos);
@@ -183,6 +187,13 @@ int ext4fs_read_file(struct ext2fs_node *node, loff_t pos,
return 0;
 }
 
+int ext4fs_get_blocksize(const char *filename)
+{
+   struct ext_filesystem *fs = get_fs();
+
+   return fs->blksz;
+}
+
 int ext4fs_ls(const char *dirname)
 {
struct ext2fs_node *dirnode = NULL;
diff --git a/fs/fs.c b/fs/fs.c
index 30696ac6c1a3..e69a0968bb6d 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -236,7 +236,7 @@ static struct fstype_info fstypes[] = {
.exists = ext4fs_exists,
.size = ext4fs_size,
.read = ext4_read_file,
-   .get_blocksize = fs_get_blocksize_unsupported,
+   .get_blocksize = ext4fs_get_blocksize,
 #ifdef CONFIG_CMD_EXT4_WRITE
.write = ext4_write_file,
.ln = ext4fs_create_link,
diff --git a/include/ext4fs.h b/include/ext4fs.h
index cb5d9cc0a5c0..cc40cfedd954 100644
--- a/include/ext4fs.h
+++ b/include/ext4fs.h
@@ -146,6 +146,7 @@ int ext4fs_create_link(const char *target, const char 
*fname);
 struct ext_filesystem *get_fs(void);
 int ext4fs_open(const char *filename, loff_t *len);
 int ext4fs_read(char *buf, loff_t offset, loff_t len, loff_t *actread);
+int ext4fs_get_blocksize(const char *filename);
 int ext4fs_mount(unsigned part_length);
 void ext4fs_close(void);
 void ext4fs_reinit_global(void);
-- 
2.36.1



[PATCH RFC 3/8] fs: btrfs: move the unaligned read code to _fs_read() for btrfs

2022-06-28 Thread Qu Wenruo
Unlike FUSE or kernel, U-boot filesystem code makes the underly fs code
to handle the unaligned read (aka, read range is not aligned to fs block
size).

This makes underlying fs code harder to implement, as unlike FUSE/kernel
code, now they have to handle unaligned read all by themselves.

This patch will change the behavior, starting from btrfs, by moving the
unaligned read code into _fs_read().

The idea is pretty simple, if we have an unaligned read request, we
handle it in the following steps:

1. Grab the blocksize of the fs

2. Read the leading unaligned range
   We will read the block that @offset is in, and copy the
   requested part into buf.

   The the block we read covers the whole range, we just call it a day.

3. Read the aligned part

4. Read the tailing unaligned range
   Pretty much the same as the leading unaligned range, just read the
   whole block where @offset + @len is, then copy the requested range
   in the buffer.

This has been tested with a proper randomly populated btrfs file, then
tried in sandbox mode with different aligned and unaligned range and
compare the output with md5sum.

Cc: Marek Behun 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/btrfs.c |  24 
 fs/btrfs/inode.c |  84 ++-
 fs/fs.c  | 148 +--
 include/btrfs.h  |   1 +
 4 files changed, 160 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 741c6e20f533..f9a67468d508 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -223,17 +223,27 @@ out:
return ret;
 }
 
+int btrfs_get_blocksize(const char *filename)
+{
+   struct btrfs_fs_info *fs_info = current_fs_info;
+
+   return fs_info->sectorsize;
+}
+
 int btrfs_read(const char *file, void *buf, loff_t offset, loff_t len,
   loff_t *actread)
 {
struct btrfs_fs_info *fs_info = current_fs_info;
struct btrfs_root *root;
-   loff_t real_size = 0;
u64 ino;
u8 type;
int ret;
 
ASSERT(fs_info);
+
+   /* Higher layer should always pass correct @len in. */
+   ASSERT(len);
+
ret = btrfs_lookup_path(fs_info->fs_root, BTRFS_FIRST_FREE_OBJECTID,
file, , , , 40);
if (ret < 0) {
@@ -246,18 +256,6 @@ int btrfs_read(const char *file, void *buf, loff_t offset, 
loff_t len,
return -EINVAL;
}
 
-   if (!len) {
-   ret = btrfs_size(file, _size);
-   if (ret < 0) {
-   error("Failed to get inode size: %s", file);
-   return ret;
-   }
-   len = real_size;
-   }
-
-   if (len > real_size - offset)
-   len = real_size - offset;
-
ret = btrfs_file_read(root, ino, offset, len, buf);
if (ret < 0) {
error("An error occurred while reading file %s", file);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d00b5153336d..5615143fab82 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -620,44 +620,6 @@ check_next:
return 1;
 }
 
-static int read_and_truncate_page(struct btrfs_path *path,
- struct btrfs_file_extent_item *fi,
- int start, int len, char *dest)
-{
-   struct extent_buffer *leaf = path->nodes[0];
-   struct btrfs_fs_info *fs_info = leaf->fs_info;
-   u64 aligned_start = round_down(start, fs_info->sectorsize);
-   u8 extent_type;
-   char *buf;
-   int page_off = start - aligned_start;
-   int page_len = fs_info->sectorsize - page_off;
-   int ret;
-
-   ASSERT(start + len <= aligned_start + fs_info->sectorsize);
-   buf = malloc_cache_aligned(fs_info->sectorsize);
-   if (!buf)
-   return -ENOMEM;
-
-   extent_type = btrfs_file_extent_type(leaf, fi);
-   if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
-   ret = btrfs_read_extent_inline(path, fi, buf);
-   memcpy(dest, buf + page_off, min(page_len, ret));
-   free(buf);
-   return len;
-   }
-
-   ret = btrfs_read_extent_reg(path, fi,
-   round_down(start, fs_info->sectorsize),
-   fs_info->sectorsize, buf);
-   if (ret < 0) {
-   free(buf);
-   return ret;
-   }
-   memcpy(dest, buf + page_off, page_len);
-   free(buf);
-   return len;
-}
-
 int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
char *dest)
 {
@@ -676,31 +638,12 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
/* Set the whole dest all zero, so we won't need to bother holes */
memset(dest, 0, len);
 
-   /* Read out the leading unaligned part */
-   if (aligned_start !=

[PATCH RFC 2/8] fs: always get the file size in _fs_read()

2022-06-28 Thread Qu Wenruo
For _fs_read(), @len == 0 means we read the whole file.
And we just pass @len == 0 to make each filesystem to handle it.

In fact we have info->size() call to properly get the filesize.

So we can not only call info->size() to grab the file_size for len == 0
case, but also detect invalid @len (e.g. @len > file_size) in advance or
truncate @len.

This behavior also allows us to handle unaligned better in the incoming
patches.

Signed-off-by: Qu Wenruo 
---
 fs/fs.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/fs.c b/fs/fs.c
index 6de1a3eb6d5d..d992cdd6d650 100644
--- a/fs/fs.c
+++ b/fs/fs.c
@@ -579,6 +579,7 @@ static int _fs_read(const char *filename, ulong addr, 
loff_t offset, loff_t len,
 {
struct fstype_info *info = fs_get_info(fs_type);
void *buf;
+   loff_t file_size;
int ret;
 
 #ifdef CONFIG_LMB
@@ -589,10 +590,26 @@ static int _fs_read(const char *filename, ulong addr, 
loff_t offset, loff_t len,
}
 #endif
 
-   /*
-* We don't actually know how many bytes are being read, since len==0
-* means read the whole file.
-*/
+   ret = info->size(filename, _size);
+   if (ret < 0) {
+   log_err("** Unable to get file size for %s, %d **\n",
+   filename, ret);
+   return ret;
+   }
+   if (offset >= file_size) {
+   log_err(
+   "** Invalid offset, offset (%llu) >= file size (%llu)\n",
+   offset, file_size);
+   return -EINVAL;
+
+   }
+   if (len == 0 || offset + len > file_size) {
+   if (len > file_size)
+   log_info(
+   "** Truncate read length from %llu to %llu, as file size is %llu **\n",
+len, file_size, file_size);
+   len = file_size - offset;
+   }
buf = map_sysmem(addr, len);
ret = info->read(filename, buf, offset, len, actread);
unmap_sysmem(buf);
-- 
2.36.1



[PATCH RFC 1/8] fs: fat: unexport file_fat_read_at()

2022-06-28 Thread Qu Wenruo
That function is only utilized inside fat driver, unexport it.

Signed-off-by: Qu Wenruo 
---
 fs/fat/fat.c  | 4 ++--
 include/fat.h | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index df9ea2c028fc..dcceccbcee0a 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -1243,8 +1243,8 @@ out_free_itr:
return ret;
 }
 
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread)
+static int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
+   loff_t maxsize, loff_t *actread)
 {
fsdata fsdata;
fat_itr *itr;
diff --git a/include/fat.h b/include/fat.h
index bd8e450b33a3..a9756fb4cd1b 100644
--- a/include/fat.h
+++ b/include/fat.h
@@ -200,8 +200,6 @@ static inline u32 sect_to_clust(fsdata *fsdata, int sect)
 int file_fat_detectfs(void);
 int fat_exists(const char *filename);
 int fat_size(const char *filename, loff_t *size);
-int file_fat_read_at(const char *filename, loff_t pos, void *buffer,
-loff_t maxsize, loff_t *actread);
 int file_fat_read(const char *filename, void *buffer, int maxsize);
 int fat_set_blk_dev(struct blk_desc *rbdd, struct disk_partition *info);
 int fat_register_device(struct blk_desc *dev_desc, int part_no);
-- 
2.36.1



[PATCH 0/8] u-boot: fs: add generic unaligned read handling

2022-06-28 Thread Qu Wenruo
[BACKGROUND]
Unlike FUSE/Kernel which always pass aligned read range, U-boot fs code
just pass the request range to underlying fses.

Under most case, this works fine, as U-boot only really needs to read
the whole file (aka, 0 for both offset and len, len will be later
determined using file size).

But if some advanced user/script wants to extract kernel/initramfs from
combined image, we may need to do unaligned read in that case.

[ADVANTAGE]
This patchset will handle unaligned read range in _fs_read():

- Get blocksize of the underlying fs

- Read the leading block contianing the unaligned range
  The full block will be stored in a local buffer, then only copy
  the bytes in the unaligned range into the destination buffer.

  If the first block covers the whole range, we just call it aday.

- Read the aligned range if there is any

- Read the tailing block containing the unaligned range
  And copy the covered range into the destination.

[DISADVANTAGE]
There are mainly two problems:

- Extra memory allocation for every _fs_read() call
  For the leading and tailing block.

- Extra path resolving
  All those supported fs will have to do extra path resolving up to 2
  times (one for the leading block, one for the tailing block).
  This may slow down the read.

[SUPPORTED FSES]

- Btrfs (manually tested*)
- Ext4 (manually tested)
- FAT (manually tested)
- Erofs
- sandboxfs
- ubifs

*: Failed to get the test cases run, thus have to go sandbox mode, and
attach an image with target fs, load the target file (with unaligned
range) and compare the result using md5sum.

For EXT4/FAT, they may need extra cleanup, as their existing unaligned
range handling is no longer needed anymore, cleaning them up should free 
more code lines than the added one.

Just not confident enough to modify them all by myself.

[UNSUPPORTED FSES]
- Squashfs
  They don't support non-zero offset, thus it can not handle the tailing
  block reading.
  Need extra help to add block aligned offset support.

- Semihostfs
  It's using hardcoded trap to do system calls, not sure how it would
  work for stat() call.

Extra testing/feedback is always appreciated.


Qu Wenruo (8):
  fs: fat: unexport file_fat_read_at()
  fs: always get the file size in _fs_read()
  fs: btrfs: move the unaligned read code to _fs_read() for btrfs
  fs: ext4: rely on _fs_read() to pass block aligned range into
ext4fs_read_file()
  fs: fat: rely on higher layer to get block aligned read range
  fs: sandboxfs: add sandbox_fs_get_blocksize()
  fs: ubifs: rely on higher layer to do unaligned read
  fs: erofs: add unaligned read range handling

 arch/sandbox/cpu/os.c  |  11 +++
 fs/btrfs/btrfs.c   |  24 +++---
 fs/btrfs/inode.c   |  84 ++--
 fs/erofs/internal.h|   1 +
 fs/erofs/super.c   |   6 ++
 fs/ext4/ext4fs.c   |  11 +++
 fs/fat/fat.c   |  17 -
 fs/fs.c| 169 +++--
 fs/sandbox/sandboxfs.c |  14 
 fs/ubifs/ubifs.c   |  13 ++--
 include/btrfs.h|   1 +
 include/erofs.h|   1 +
 include/ext4fs.h   |   1 +
 include/fat.h  |   3 +-
 include/os.h   |   8 ++
 include/sandboxfs.h|   1 +
 include/ubifs_uboot.h  |   1 +
 17 files changed, 258 insertions(+), 108 deletions(-)

-- 
2.36.1



Re: [PATCH 1/1] btrfs: simplify lookup_data_extent()

2022-05-11 Thread Qu Wenruo




On 2022/5/11 06:48, Qu Wenruo wrote:



On 2022/5/11 03:43, Heinrich Schuchardt wrote:

After returning if ret <= 0 we know that ret > 0. No need to check it.

Signed-off-by: Heinrich Schuchardt 


Reviewed-by: Qu Wenruo 


Just to mention for other guys in the btrfs list, this patch is for
U-boot btrfs implementation.

And I also checked btrfs-fuse project, which has a similar function,
lookup_file_extent(), it already does the check properly and even do
extra quick exit for (ret > 0 && path->slots[0] == 0) case.

So you may want to also check btrfs-fuse project to find some possible
optimization and cross-port to U-boot.
(So far btrfs-fuse has better test coverage using fsstress, and
cross-checked against kernel).

Thanks,
Qu


Thanks,
Qu

---
  fs/btrfs/inode.c | 15 ++-
  1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d00b515333..0173d30cd8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -546,15 +546,12 @@ static int lookup_data_extent(struct btrfs_root
*root, struct btrfs_path *path,
  /* Error or we're already at the file extent */
  if (ret <= 0)
  return ret;
-    if (ret > 0) {
-    /* Check previous file extent */
-    ret = btrfs_previous_item(root, path, ino,
-  BTRFS_EXTENT_DATA_KEY);
-    if (ret < 0)
-    return ret;
-    if (ret > 0)
-    goto check_next;
-    }
+    /* Check previous file extent */
+    ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY);
+    if (ret < 0)
+    return ret;
+    if (ret > 0)
+    goto check_next;
  /* Now the key.offset must be smaller than @file_offset */
  btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
  if (key.objectid != ino ||


Re: [PATCH 1/1] btrfs: simplify lookup_data_extent()

2022-05-10 Thread Qu Wenruo




On 2022/5/11 03:43, Heinrich Schuchardt wrote:

After returning if ret <= 0 we know that ret > 0. No need to check it.

Signed-off-by: Heinrich Schuchardt 


Reviewed-by: Qu Wenruo 

Thanks,
Qu

---
  fs/btrfs/inode.c | 15 ++-
  1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d00b515333..0173d30cd8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -546,15 +546,12 @@ static int lookup_data_extent(struct btrfs_root *root, 
struct btrfs_path *path,
/* Error or we're already at the file extent */
if (ret <= 0)
return ret;
-   if (ret > 0) {
-   /* Check previous file extent */
-   ret = btrfs_previous_item(root, path, ino,
- BTRFS_EXTENT_DATA_KEY);
-   if (ret < 0)
-   return ret;
-   if (ret > 0)
-   goto check_next;
-   }
+   /* Check previous file extent */
+   ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY);
+   if (ret < 0)
+   return ret;
+   if (ret > 0)
+   goto check_next;
/* Now the key.offset must be smaller than @file_offset */
btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
if (key.objectid != ino ||


Handle unaligned read at fs layer?

2022-04-15 Thread Qu Wenruo

Hi U-boot fs guys,

With my previous rework on U-boot btrfs, and my own btrfs-fuse project,
it turns out that, although U-boot implements its fs with a very fuse
like interface, there are still some quality-of-life features missing.

One of the most obvious one is making each fs to handle unaligned read.

This is especially obvious when comparing btrfs-fuse btrfs_file_read()
against U-boot btrfs_file_read().

They share almost the same main loop, but btrfs_file_read() in U-boot
have two extra handling, for unaligned leading sector and unaligned
tailing sector.


So I just did a quick check on the other fses, it turns out that we have
a very different behaviors:

- FAT
- Ext4
  Have the handling for both tailing and ending block.

- Btrfs
  My code, doing unaligned leading/tailing block manually.
  (the tailing part looks unnecessary though)

- Ubifs
  Explicitly reject read with non-aligned offset.

- Squshfs
  Explicitly reject read with non-zero offset.

- Erofs
  Have the handling for tailing unaligned part, but doesn't seem to
  handle unaligned start part.

Thus if we have a proper common routine for unaligned read routine (get
the file length, read the first and the last unaligned block, then let
the .read callback to handle the aligned range), then all fs drivers
would have a much better life.

Mind me to do such small but quality-of-life improve for all involved fses?
(This will need a new call back to query the blocksize of each fs though)

Thanks,
Qu


Re: [PATCH] btrfs: Fix compilation on big endian systems

2022-04-07 Thread Qu Wenruo




On 2022/4/7 20:51, Pali Rohár wrote:

Fix following two compile errors on big endian systems:

   CC  fs/btrfs/btrfs.o
In file included from include/linux/byteorder/big_endian.h:107,
  from ./arch/powerpc/include/asm/byteorder.h:82,
  from ./arch/powerpc/include/asm/bitops.h:8,
  from include/linux/bitops.h:152,
  from include/uuid.h:9,
  from fs/btrfs/btrfs.c:10:
fs/btrfs/conv-funcs.h: In function ‘btrfs_key_to_disk’:
include/linux/byteorder/generic.h:90:21: error: ‘__cpu_to_le16’ undeclared 
(first use in this function); did you mean ‘__cpu_to_le16p’?
  #define cpu_to_le16 __cpu_to_le16
  ^
fs/btrfs/conv-funcs.h:79:10: note: in expansion of macro ‘cpu_to_le16’
__u16: cpu_to_le16, \
   ^~~

   CC  fs/btrfs/compression.o
In file included from ./arch/powerpc/include/asm/unaligned.h:9,
  from fs/btrfs/compression.c:16:
include/linux/unaligned/access_ok.h:6:19: error: redefinition of 
‘get_unaligned_le16’
  static inline u16 get_unaligned_le16(const void *p)
^~
In file included from fs/btrfs/ctree.h:16,
  from fs/btrfs/btrfs.h:12,
  from fs/btrfs/compression.c:8:
include/linux/unaligned/le_byteshift.h:40:19: note: previous definition of 
‘get_unaligned_le16’ was here
  static inline u16 get_unaligned_le16(const void *p)
^~

Include file asm/unaligned.h contains arch specific macros and functions
for unaligned access as opposite to linux/unaligned le_byteshift.h which
contains macros and functions specific to little endian systems only.

Signed-off-by: Pali Rohár 


Reviewed-by: Qu Wenruo 

Thanks,
Qu

---
  fs/btrfs/btrfs.h | 2 +-
  fs/btrfs/ctree.h | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/btrfs.h b/fs/btrfs/btrfs.h
index 7d8b395b2646..a52587e0637a 100644
--- a/fs/btrfs/btrfs.h
+++ b/fs/btrfs/btrfs.h
@@ -9,7 +9,7 @@
  #define __BTRFS_BTRFS_H__

  #include 
-#include "conv-funcs.h"
+#include "ctree.h"

  extern struct btrfs_info btrfs_info;
  extern struct btrfs_fs_info *current_fs_info;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 219c410b189f..55112318a330 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -13,7 +13,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include "kernel-shared/btrfs_tree.h"
  #include "crypto/hash.h"


Re: [PATCH] fs/btrfs: fix a bug that U-boot fs btrfs implementation doesn't handle NO_HOLE feature correctly

2021-12-27 Thread Qu Wenruo




On 2021/12/27 14:11, Qu Wenruo wrote:

[BUG]
When passing a btrfs with NO_HOLE feature to U-boot, and if one file
contains holes, then the hash of the file is not correct in U-boot:

  # mkfs.btrfs -f test.img  # Since v5.15, mkfs defaults to NO_HOLES
  # mount test.img /mnt/btrfs
  # xfs_io -f -c "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/btrfs/file
  # md5sum /mnt/btrfs/file
  277f3840b275c74d01e979ea9d75ac19  /mnt/btrfs/file
  # umount /mnt/btrfs
  # ./u-boot
  => host bind 0 /home/adam/test.img
  => ls host 0
  <   >  12288  Mon Dec 27 05:35:23 2021  file
  => load host 0 0x100 file
  12288 bytes read in 0 ms
  => md5sum 0x100 0x3000
  md5 for 0100 ... 01002fff ==> 855ffdbe4d0ccc5acab92e1b5330e4c1

The md5sum doesn't match at all.

[CAUSE]
In U-boot btrfs implementation, the function btrfs_read_file() has the
following iteration for file extent iteration:

/* Read the aligned part */
while (cur < aligned_end) {
ret = lookup_data_extent(root, , ino, cur, _offset);
if (ret < 0)
goto out;
if (ret > 0) {
/* No next, direct exit */
if (!next_offset) {
ret = 0;
goto out;
}
}
/* Read file extent */

But for NO_HOLES features, hole extents will not have any extent item
for it.
Thus if @cur is at a hole, lookup_data_extent() will just return >0, and
update @next_offset.

But we still believe there is some data to read for @cur for ret > 0
case, causing we read extent data from the next file extent.

This means, what we do for above NO_HOLES btrfs is:
- Read 4K data from disk to file offset [0, 4K)
   So far the data is still correct

- Read 4K data from disk to file offset [4K, 8K)
   We didn't skip the 4K hole, but read the data at file offset [8K, 12K)
   into file offset [4K, 8K).

   This causes the checksum mismatch.

[FIX]
Add extra check to skip to the next non-hole range after
lookup_data_extent().

Signed-off-by: Qu Wenruo 
---
This bug exposed another missing link, that we don't have good test
coverage in U-boot btrfs.

This is partially caused by the fact that, btrfs-progs code is not
designed to read file contents, but just to check the cross-reference
(aka, btrfs-check).

If we really only want read-only support in U-boot, and don't ever plan
to add write support, then I'd say the btrfs-fuse project
(https://github.com/adam900710/btrfs-fuse/) is more suitable for U-boot.

As that project already has full fs content verification selftest along
with extra multi-device recovery tests.
And shares the same code style between btrfs-progs/kernel.


OK, things are not that bad.

In fact, the btrfs_read_file() implementation in btrfs-fuse has the same 
naming, same lookup_file_extent() (just a little naming different than 
lookup_data_extent()), same parameter list.


Just without the unaligned sector handling (handled by FUSE, and it may 
also be unnecessary for U-boot too), and already have the correct 
handling for lookup_file_extent(), thanks to the selftest.


So this already means, it can be pretty easy for U-boot to take code 
from btrfs-fuse part by part, without huge refactor again.


Thanks,
Qu


---
  fs/btrfs/inode.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2c2379303d74..d00b5153336d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -717,6 +717,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
ret = 0;
goto out;
}
+   /*
+* Find a extent gap, mostly caused by NO_HOLE feature.
+* Just to next offset directly.
+*/
+   if (next_offset > cur) {
+   cur = next_offset;
+   continue;
+   }
}
fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
struct btrfs_file_extent_item);




[PATCH 1/2] lib: add BLAKE2 hash support

2021-12-26 Thread Qu Wenruo
The code is cross-ported from BLAKE2 reference implementation
(https://github.com/BLAKE2/BLAKE2).

With minimal change to remove unused macros/features.

Currently there is only one user inside U-boot (btrfs), and since it
only utilize BLAKE2B, all other favors are all removed.

Signed-off-by: Qu Wenruo 
---
 include/u-boot/blake2.h  |  94 
 lib/Kconfig  |   8 +
 lib/Makefile |   1 +
 lib/blake2/blake2-impl.h | 163 
 lib/blake2/blake2b.c | 311 +++
 5 files changed, 577 insertions(+)
 create mode 100644 include/u-boot/blake2.h
 create mode 100644 lib/blake2/blake2-impl.h
 create mode 100644 lib/blake2/blake2b.c

diff --git a/include/u-boot/blake2.h b/include/u-boot/blake2.h
new file mode 100644
index ..4984023a1a2d
--- /dev/null
+++ b/include/u-boot/blake2.h
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: CC0-1.0 */
+/*
+   BLAKE2 reference source code package - reference C implementations
+
+   Copyright 2012, Samuel Neves .  You may use this under the
+   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
+   your option.  The terms of these licenses can be found at:
+
+   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
+   - OpenSSL license   : https://www.openssl.org/source/license.html
+   - Apache 2.0: http://www.apache.org/licenses/LICENSE-2.0
+
+   More information about the BLAKE2 hash function can be found at
+   https://blake2.net.
+*/
+
+/*
+ * Cross-ported from BLAKE2 (https://github.com/BLAKE2/BLAKE2).
+ * Modifications includes:
+ *
+ * - Remove unsupported compilers like MSC/CPP
+ * - Remove blake2s/blake2sp/blake2bp
+ *   This blake2 implementation is mostly for btrfs, which only utilizes
+ *   blake2b.
+ */
+#ifndef BLAKE2_H
+#define BLAKE2_H
+
+#include 
+#include 
+
+#if defined(_MSC_VER)
+#define BLAKE2_PACKED(x) __pragma(pack(push, 1)) x __pragma(pack(pop))
+#else
+#define BLAKE2_PACKED(x) x __attribute__((packed))
+#endif
+
+  enum blake2b_constant
+  {
+BLAKE2B_BLOCKBYTES = 128,
+BLAKE2B_OUTBYTES   = 64,
+BLAKE2B_KEYBYTES   = 64,
+BLAKE2B_SALTBYTES  = 16,
+BLAKE2B_PERSONALBYTES = 16
+  };
+
+  typedef struct blake2b_state__
+  {
+uint64_t h[8];
+uint64_t t[2];
+uint64_t f[2];
+uint8_t  buf[BLAKE2B_BLOCKBYTES];
+size_t   buflen;
+size_t   outlen;
+uint8_t  last_node;
+  } blake2b_state;
+
+  BLAKE2_PACKED(struct blake2b_param__
+  {
+uint8_t  digest_length; /* 1 */
+uint8_t  key_length;/* 2 */
+uint8_t  fanout;/* 3 */
+uint8_t  depth; /* 4 */
+uint32_t leaf_length;   /* 8 */
+uint32_t node_offset;   /* 12 */
+uint32_t xof_length;/* 16 */
+uint8_t  node_depth;/* 17 */
+uint8_t  inner_length;  /* 18 */
+uint8_t  reserved[14];  /* 32 */
+uint8_t  salt[BLAKE2B_SALTBYTES]; /* 48 */
+uint8_t  personal[BLAKE2B_PERSONALBYTES];  /* 64 */
+  });
+
+  typedef struct blake2b_param__ blake2b_param;
+
+  /* Padded structs result in a compile-time error */
+  enum {
+BLAKE2_DUMMY_2 = 1/(int)(sizeof(blake2b_param) == BLAKE2B_OUTBYTES)
+  };
+
+  /* Streaming API */
+  int blake2b_init( blake2b_state *S, size_t outlen );
+  int blake2b_init_key( blake2b_state *S, size_t outlen, const void *key, 
size_t keylen );
+  int blake2b_init_param( blake2b_state *S, const blake2b_param *P );
+  int blake2b_update( blake2b_state *S, const void *in, size_t inlen );
+  int blake2b_final( blake2b_state *S, void *out, size_t outlen );
+
+  /* Simple API */
+  int blake2b( void *out, size_t outlen, const void *in, size_t inlen, const 
void *key, size_t keylen );
+
+  /* This is simply an alias for blake2b */
+  int blake2( void *out, size_t outlen, const void *in, size_t inlen, const 
void *key, size_t keylen );
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 807a4c6ade06..e9976c7425e2 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -357,6 +357,14 @@ endmenu
 
 menu "Hashing Support"
 
+config BLAKE2
+   bool "Enable BLAKE2 support"
+   help
+ This option enables support of hashing using BLAKE2B algorithm.
+ The hash is calculated in software.
+ The BLAKE2 algorithm produces a hash value (digest) between 1 and
+ 64 bytes.
+
 config SHA1
bool "Enable SHA1 support"
help
diff --git a/lib/Makefile b/lib/Makefile
index 5ddbc77ed6d8..7950e847a9e2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_$(SPL_)MD5) += md5.o
 obj-$(CONFIG_ECDSA) += ecdsa/
 obj-$(CONFIG_$(SPL_)RSA) += rsa/
 obj-$(CONFIG_HASH) += hash-checksum.o
+obj-$(CONFIG_BLAKE2) += blake2/blake2b.o
 obj-$(CONFIG_SHA1) += sha1.o
 obj-$(CONFIG_SHA256) += sha256.o
 obj-$(CONFIG_SHA512) += sha512.o
diff --git a/lib/blake2/blake2-impl.h b/lib/blake2/blake2-impl.h
new file mode 100644
index ..7ee52fdf5ada
--- /dev/null
+++ b/lib/blake2/blake2-impl.h
@@ -

[PATCH 2/2] fs/btrfs: add dependency on BLAKE2 hash

2021-12-26 Thread Qu Wenruo
Now btrfs can utilize the newly intorudced BLAKE2 hash.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/Kconfig   |  1 +
 fs/btrfs/crypto/hash.c | 14 ++
 fs/btrfs/crypto/hash.h |  1 +
 fs/btrfs/disk-io.c |  2 ++
 4 files changed, 18 insertions(+)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index a0b48c23b31a..e31afe595f31 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -6,6 +6,7 @@ config FS_BTRFS
select ZSTD
select RBTREE
select SHA256
+   select BLAKE2
help
  This provides a single-device read-only BTRFS support. BTRFS is a
  next-generation Linux file system based on the copy-on-write
diff --git a/fs/btrfs/crypto/hash.c b/fs/btrfs/crypto/hash.c
index fb51f6386cb1..891a2974be05 100644
--- a/fs/btrfs/crypto/hash.c
+++ b/fs/btrfs/crypto/hash.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static u32 btrfs_crc32c_table[256];
@@ -39,6 +40,19 @@ int hash_xxhash(const u8 *buf, size_t length, u8 *out)
return 0;
 }
 
+/* We use the full CSUM_SIZE(32) for BLAKE2B */
+#define BTRFS_BLAKE2_HASH_SIZE 32
+int hash_blake2(const u8 *buf, size_t length, u8 *out)
+{
+   blake2b_state S;
+
+   blake2b_init(, BTRFS_BLAKE2_HASH_SIZE);
+   blake2b_update(, buf, length);
+   blake2b_final(, out, BTRFS_BLAKE2_HASH_SIZE);
+
+   return 0;
+}
+
 int hash_crc32c(const u8 *buf, size_t length, u8 *out)
 {
u32 crc;
diff --git a/fs/btrfs/crypto/hash.h b/fs/btrfs/crypto/hash.h
index d1ba1fa374e3..f9846038bfa3 100644
--- a/fs/btrfs/crypto/hash.h
+++ b/fs/btrfs/crypto/hash.h
@@ -9,6 +9,7 @@ void btrfs_hash_init(void);
 int hash_crc32c(const u8 *buf, size_t length, u8 *out);
 int hash_xxhash(const u8 *buf, size_t length, u8 *out);
 int hash_sha256(const u8 *buf, size_t length, u8 *out);
+int hash_blake2(const u8 *buf, size_t length, u8 *out);
 
 u32 crc32c(u32 seed, const void * data, size_t len);
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index eb7736d53e11..8043abc1bd60 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -126,6 +126,8 @@ int btrfs_csum_data(u16 csum_type, const u8 *data, u8 *out, 
size_t len)
return hash_xxhash(data, len, out);
case BTRFS_CSUM_TYPE_SHA256:
return hash_sha256(data, len, out);
+   case BTRFS_CSUM_TYPE_BLAKE2:
+   return hash_blake2(data, len, out);
default:
printf("Unknown csum type %d\n", csum_type);
return -EINVAL;
-- 
2.34.1



[PATCH] fs/btrfs: fix a bug that U-boot fs btrfs implementation doesn't handle NO_HOLE feature correctly

2021-12-26 Thread Qu Wenruo
[BUG]
When passing a btrfs with NO_HOLE feature to U-boot, and if one file
contains holes, then the hash of the file is not correct in U-boot:

 # mkfs.btrfs -f test.img   # Since v5.15, mkfs defaults to NO_HOLES
 # mount test.img /mnt/btrfs
 # xfs_io -f -c "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/btrfs/file
 # md5sum /mnt/btrfs/file
 277f3840b275c74d01e979ea9d75ac19  /mnt/btrfs/file
 # umount /mnt/btrfs
 # ./u-boot
 => host bind 0 /home/adam/test.img
 => ls host 0
 <   >  12288  Mon Dec 27 05:35:23 2021  file
 => load host 0 0x100 file
 12288 bytes read in 0 ms
 => md5sum 0x100 0x3000
 md5 for 0100 ... 01002fff ==> 855ffdbe4d0ccc5acab92e1b5330e4c1

The md5sum doesn't match at all.

[CAUSE]
In U-boot btrfs implementation, the function btrfs_read_file() has the
following iteration for file extent iteration:

/* Read the aligned part */
while (cur < aligned_end) {
ret = lookup_data_extent(root, , ino, cur, _offset);
if (ret < 0)
goto out;
if (ret > 0) {
/* No next, direct exit */
if (!next_offset) {
ret = 0;
goto out;
}
}
/* Read file extent */

But for NO_HOLES features, hole extents will not have any extent item
for it.
Thus if @cur is at a hole, lookup_data_extent() will just return >0, and
update @next_offset.

But we still believe there is some data to read for @cur for ret > 0
case, causing we read extent data from the next file extent.

This means, what we do for above NO_HOLES btrfs is:
- Read 4K data from disk to file offset [0, 4K)
  So far the data is still correct

- Read 4K data from disk to file offset [4K, 8K)
  We didn't skip the 4K hole, but read the data at file offset [8K, 12K)
  into file offset [4K, 8K).

  This causes the checksum mismatch.

[FIX]
Add extra check to skip to the next non-hole range after
lookup_data_extent().

Signed-off-by: Qu Wenruo 
---
This bug exposed another missing link, that we don't have good test
coverage in U-boot btrfs.

This is partially caused by the fact that, btrfs-progs code is not
designed to read file contents, but just to check the cross-reference
(aka, btrfs-check).

If we really only want read-only support in U-boot, and don't ever plan
to add write support, then I'd say the btrfs-fuse project
(https://github.com/adam900710/btrfs-fuse/) is more suitable for U-boot.

As that project already has full fs content verification selftest along
with extra multi-device recovery tests.
And shares the same code style between btrfs-progs/kernel.
---
 fs/btrfs/inode.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2c2379303d74..d00b5153336d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -717,6 +717,14 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 
file_offset, u64 len,
ret = 0;
goto out;
}
+   /*
+* Find a extent gap, mostly caused by NO_HOLE feature.
+* Just to next offset directly.
+*/
+   if (next_offset > cur) {
+   cur = next_offset;
+   continue;
+   }
}
fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
struct btrfs_file_extent_item);
-- 
2.34.1



Re: U-Boot btrfs implementation - will there by syncing?

2021-12-16 Thread Qu Wenruo




On 2021/12/17 07:45, Qu Wenruo wrote:



On 2021/12/17 07:05, Marek Behún wrote:

Hello Qu and others,

one of the points of Qu's reimplementation of U-Boot's btrfs code that
made it somehow synced with btrfs-progs, was that the old
implementation was, quote

   pretty different from btrfs-progs nor kernel, making it pretty hard to
   sync code between different projects

Another point was that afterwards Qu wanted to bring some of the missing
btrfs features into U-Boot, since they should be much easier to
implement after the reimplementation.

I was looking at current btrfs-progs, and noticed that there were many
changes since then, but only one or two were ever synced to U-Boot.


Yep, those changes are mostly preparation for the incoming extent-tree
v2 feature, they bring no real behavior change for now.

And I'm planning to cross-port them just to make later cross-port easier.


Forgot to mention that, a lot of btrfs-progs changes are not to the core
implementation, but ioctls/user-interafaces/check/image tools, thus you
may need to do more filtering to figure out what's really needed to be
backported.

But I may not cover all changes, so if you find some aspects missing,
feel free to point it out, or even better, try to crossport if possible.



Thanks for the reminder.



I would like to know if Qu, or anyone else, is planning to do this
code-syncing, and maybe implement some of the missing features?


Some missing features, are blocked by lacking of interfaces and knoledge.

Currently the missing hash (BLAKE2) needs BLAKE2 support in U-boot,
which is still not yet in U-boot.

If really needed, I may crossport BLAKE2B to Uboot then implement the
support in btrfs.


The similar case is for multi-device support, I'm not familiar enough on
how to iterate through all block devices in Uboot.

If there is already such interfaces or someone could provide some help,
I'm pretty happy to implement it in Uboot.



The reason I am asking this is that in the meantime I've reached the
opinion that the idea of code-syncing in this sense almost never really
works. It is usually proposed by one person who ports the code from
another project, and then keeps it synced for a time, but then stops
doing it because they have too much other work, or change employer, or
another reason.


Personally speaking, my current employer (SUSE) is super awesome, and
there is no plan to move to another employer.

Furthermore, SUSE is a super strong supporter for btrfs, thus
maintaining a common base to provide better btrfs support is definitely
benefiting SUSE.

Thus you could count on such long term support, and personally speaking
I'm pretty happy to work on btrfs, even not paid for.

So the long term support, at least from myself, should be something you
can rely on.

Although there are indeed cases where I'm buried by other btrfs work,
thus such reminder is really helpful.

Thanks,
Qu



It happened multiple times. One example is U-Boot's Hush shell, which
was imported from busybox, with the intention to keep it in sync, but
the reality is that U-Boot's implementation is now years (decade?) old,
and there were so many ad-hoc fixes done in U-Boot that currently the
only way to sync is basically to change the whole code (which will also
cause issues with existing user scripts, but maybe it should be done
anyway).

Another reason why I am writing this is that I think that with the
amount of work Qu put into that reimplementation last year, he could
have instead implemented some of the new features in the old
implementation and spend a similar amount of time doing that, and the
result would be those new features in U-Boot for over a year by now.


I doubt that, the original interface is completely different than the
interface we have in kernel/progs.

That means, if one wants to cross port the recent extent-tree v2
changes, it will be a hell.

So in the long run, it's still beneficial, even they get de-synced somehow.



Also, I've come to the opinion that maybe two different and maintained
implementations are a good thing for such a project as a filesystem,
similar to how multiple implementations of C/C++ compiler are a good
thing.

For the last few months I was on-and-off thinking about whether it
would make sense to revert to the original implementation and then
implement some of the missing features. But it would not make sense if
Qu, or anyone else, is planning to do that code syncing and bringing of
new features from btrfs-progs, so that's why I am asking.


The worst example for different implementations are another bootloader,
GRUB.

Its btrfs implementation has way less features, no sanity checks, and is
super easy to craft an image to cause infinite loop for GRUB.
And it even can not handle the new default mkfs features (NO_HOLES,
which is there for long long time).

Thus I even re-implemented a new license compatible project from
scratch, btrfs-fuse, to provide a basis for later cross-port.
(That only has read-only

Re: U-Boot btrfs implementation - will there by syncing?

2021-12-16 Thread Qu Wenruo




On 2021/12/17 07:05, Marek Behún wrote:

Hello Qu and others,

one of the points of Qu's reimplementation of U-Boot's btrfs code that
made it somehow synced with btrfs-progs, was that the old
implementation was, quote

   pretty different from btrfs-progs nor kernel, making it pretty hard to
   sync code between different projects

Another point was that afterwards Qu wanted to bring some of the missing
btrfs features into U-Boot, since they should be much easier to
implement after the reimplementation.

I was looking at current btrfs-progs, and noticed that there were many
changes since then, but only one or two were ever synced to U-Boot.


Yep, those changes are mostly preparation for the incoming extent-tree
v2 feature, they bring no real behavior change for now.

And I'm planning to cross-port them just to make later cross-port easier.

Thanks for the reminder.



I would like to know if Qu, or anyone else, is planning to do this
code-syncing, and maybe implement some of the missing features?

The reason I am asking this is that in the meantime I've reached the
opinion that the idea of code-syncing in this sense almost never really
works. It is usually proposed by one person who ports the code from
another project, and then keeps it synced for a time, but then stops
doing it because they have too much other work, or change employer, or
another reason.

It happened multiple times. One example is U-Boot's Hush shell, which
was imported from busybox, with the intention to keep it in sync, but
the reality is that U-Boot's implementation is now years (decade?) old,
and there were so many ad-hoc fixes done in U-Boot that currently the
only way to sync is basically to change the whole code (which will also
cause issues with existing user scripts, but maybe it should be done
anyway).

Another reason why I am writing this is that I think that with the
amount of work Qu put into that reimplementation last year, he could
have instead implemented some of the new features in the old
implementation and spend a similar amount of time doing that, and the
result would be those new features in U-Boot for over a year by now.


I doubt that, the original interface is completely different than the
interface we have in kernel/progs.

That means, if one wants to cross port the recent extent-tree v2
changes, it will be a hell.

So in the long run, it's still beneficial, even they get de-synced somehow.



Also, I've come to the opinion that maybe two different and maintained
implementations are a good thing for such a project as a filesystem,
similar to how multiple implementations of C/C++ compiler are a good
thing.

For the last few months I was on-and-off thinking about whether it
would make sense to revert to the original implementation and then
implement some of the missing features. But it would not make sense if
Qu, or anyone else, is planning to do that code syncing and bringing of
new features from btrfs-progs, so that's why I am asking.


The worst example for different implementations are another bootloader,
GRUB.

Its btrfs implementation has way less features, no sanity checks, and is
super easy to craft an image to cause infinite loop for GRUB.
And it even can not handle the new default mkfs features (NO_HOLES,
which is there for long long time).

Thus I even re-implemented a new license compatible project from
scratch, btrfs-fuse, to provide a basis for later cross-port.
(That only has read-only support, and is mostly MIT licesnsed, the
interface is almost the same as progs/kernel).

It may be possible for simpler filesystems, but for complex filesysmtes
like btrfs, a completely different implementation with completely
different interfaces is really not an ideal situation.

I don't really want to do it again and again, using different (and
sometimes very crappy) interfaces to do the same work.

Thanks,
Qu



Please give your opinions, people.

Thanks.

Marek


Re: [PATCH] distro_bootcmd: change the default dtb search path to include default kernel dtbs directory

2021-09-17 Thread Qu Wenruo




On 2021/9/17 22:25, Mark Kettenis wrote:

Date: Fri, 17 Sep 2021 19:42:29 +0800
Content-Language: en-US

On 2021/9/17 19:34, Mark Kettenis wrote:

From: Qu Wenruo 
Date: Fri, 17 Sep 2021 19:02:35 +0800

When booting using U-boot -> systemd-boot (EFI payload) -> kernel on
RK3399, systemd-boot by some bug can't execute its "devicetree" key
correctly to load its proper dtb from files.

In that case, it will use fallback dtb from U-boot, which can be
out-of-date, and on RK3399, even the latest U-boot contains out-of-date
dtb which can cause problems like invalid opp tables.

And for systemd-boot, it doesn't provide any board specific dtb, but
completely relies on the EFI environment provided by U-boot, thus if we
can't find a good dtb, the fallback one will be used anyway.

So this patch will workaround the problem by appending common linux dtbs
directory to the existing "efi_dtb_prefixes" so that for systemd-boot it
can use the existing fdt and boot properly.


Why isn't the dtb installed in the standard location?


Isn't standard location "/dtbs" other than "/dtb"?

At least kernel puts its dtbs into "/dtbs" by default.


Hmm, well, when I Alexander Graf introduced the code to load an
updated DTB the intention was clearly to load it from the EFI System
Partition (ESP).  I doubt that's where "the kernel" sticks them by
default.  So I suspect the intention is that you copy the DTB to the
/dtb directory on the ESP if needed.  Clearly that isn't what you've
done on your system.


Well, for this U-boot > systemd-boot > kernel loading scheme,
systemd-boot is responsible to load the fdt.

But for now (the latest stable release), it doesn't support fdt loading
at all, and only recently get the ability:
https://github.com/systemd/systemd/pull/19417

So in theory, it doesn't need any fdt for systemd-boot.

It's more like a workaround (and not really needed now).



I suspect things have moved on a bit as the discussion in the systemd
thread shows.  One of the new developments is the EFI_DT_FIXUP_PROCOL
feature that is clearly targeted at EFI bootloaders picking and
loading the DT instead of U-Boot.


Yep, I didn't notice the development in systemd part.



In this light I would be somewhat reluctant adding more directories to
efi_dtb_prefixes.


Yeah, please discard the patch, it's really a workaround and not needed
anymore.

Thanks,
Qu




Slowing the boot process down because distro's can't agree where to
put the files is a bit lame...


Would it be better to make the search path configurable at config time?

By that each distro should config their dtbs search path, which could
further optimize the boot sequence by removing all other unnecessary
prefixes.


Not in my opinion.  I consider it a bad thing if people configure
U-Boot specifically for their distro.  The whole idea of distroboot
and EFI in U-Boot is to be able to boot just any OS with a generic
U-Boot binary.


Thanks,
Qu




Signed-off-by: Qu Wenruo 
---
   include/config_distro_bootcmd.h | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/config_distro_bootcmd.h b/include/config_distro_bootcmd.h
index 2627c2a6a541..4ec87483eb65 100644
--- a/include/config_distro_bootcmd.h
+++ b/include/config_distro_bootcmd.h
@@ -151,7 +151,7 @@
"load ${devtype} ${devnum}:${distro_bootpart} "   \
"${fdt_addr_r} ${prefix}${efi_fdtfile}\0" \
\
-   "efi_dtb_prefixes=/ /dtb/ /dtb/current/\0"\
+   "efi_dtb_prefixes=/ /dtb/ /dtb/current/ /dtbs/ /dtbs/current/\0"  \
"scan_dev_for_efi="   \
"setenv efi_fdtfile ${fdtfile}; " \
BOOTENV_EFI_SET_FDTFILE_FALLBACK  \
--
2.33.0






Re: [systemd-devel] Systemd-boot not properly loading device tree, when loaded by U-boot (ARM64, tested on RK3399)

2021-09-17 Thread Qu Wenruo




On 2021/9/17 19:45, Lennart Poettering wrote:

On Fr, 17.09.21 19:25, Qu Wenruo (w...@suse.com) wrote:


Hi,

I'm recently testing booting my RK3399 boards with the following boot
sequence:

U-boot -> systemd-boot (EFI payload) -> kernel

Which provides much more flex than plain extlinux conf from U-boot.
(More choice, easier to write config, runtime kernel change).

So far "kernel" and "initramfs" key work fine.

But I notice that "devicetree" key is not working properly.

The Uboot fdt search path doesn't include "/dtbs" which is used by my
distro, and my entry config specify the device-tree file like this:

titleManjaroARM boot from nvme
linux/Image
devicetree/dtbs/rockchip/rk3399-rockpro64.dtb
initrd/initramfs-linux.img
optionsconsole=ttyS2,150 root=/dev/arm_nvme/root rw loglevel=7

Thus if systemd-boot doesn't load the correct device-tree, kernel will
use the default fdt passed from Uboot, which is already out-of-date and
can cause problems for the upstream kernel I used.

Unfortunately, with above config, after booting the kernel, the fdt is
the fallback one from Uboot, not loading the proper one specified by
systemd-boot config.

The proof I went is checking the opp table.
I have replaced the "/dtbs/rockchip/rk3399-rockpro64.dtb" with a custom
dtb which uses op1 tables.
But the kernel only sees a very out-of-dated fdt, which some opp is even
invalid.

How could I continue debugging the missing link?
Like what systemd-boot needs to load the device-tree? Or U-boot EFI
environment lacks certain facility to support systemd-boot?


Did you see this:

https://github.com/systemd/systemd/pull/19417


Confirmed this pull fixes the problem.

I only need to wait for next release to get it from my distro.

Awesome!

Thanks,
Qu



(and maybe this: https://github.com/systemd/systemd/pull/20601)

maybe that addresses your issues?

Lennart

--
Lennart Poettering, Berlin





Re: [systemd-devel] Systemd-boot not properly loading device tree, when loaded by U-boot (ARM64, tested on RK3399)

2021-09-17 Thread Qu Wenruo




On 2021/9/17 19:45, Lennart Poettering wrote:

On Fr, 17.09.21 19:25, Qu Wenruo (w...@suse.com) wrote:


Hi,

I'm recently testing booting my RK3399 boards with the following boot
sequence:

U-boot -> systemd-boot (EFI payload) -> kernel

Which provides much more flex than plain extlinux conf from U-boot.
(More choice, easier to write config, runtime kernel change).

So far "kernel" and "initramfs" key work fine.

But I notice that "devicetree" key is not working properly.

The Uboot fdt search path doesn't include "/dtbs" which is used by my
distro, and my entry config specify the device-tree file like this:

titleManjaroARM boot from nvme
linux/Image
devicetree/dtbs/rockchip/rk3399-rockpro64.dtb
initrd/initramfs-linux.img
optionsconsole=ttyS2,150 root=/dev/arm_nvme/root rw loglevel=7

Thus if systemd-boot doesn't load the correct device-tree, kernel will
use the default fdt passed from Uboot, which is already out-of-date and
can cause problems for the upstream kernel I used.

Unfortunately, with above config, after booting the kernel, the fdt is
the fallback one from Uboot, not loading the proper one specified by
systemd-boot config.

The proof I went is checking the opp table.
I have replaced the "/dtbs/rockchip/rk3399-rockpro64.dtb" with a custom
dtb which uses op1 tables.
But the kernel only sees a very out-of-dated fdt, which some opp is even
invalid.

How could I continue debugging the missing link?
Like what systemd-boot needs to load the device-tree? Or U-boot EFI
environment lacks certain facility to support systemd-boot?


Did you see this:

https://github.com/systemd/systemd/pull/19417

(and maybe this: https://github.com/systemd/systemd/pull/20601)


Awesome! Let me try these PRs and report back!

Thanks for the super-fast mention!
Qu



maybe that addresses your issues?

Lennart

--
Lennart Poettering, Berlin





Re: [PATCH] distro_bootcmd: change the default dtb search path to include default kernel dtbs directory

2021-09-17 Thread Qu Wenruo




On 2021/9/17 19:34, Mark Kettenis wrote:

From: Qu Wenruo 
Date: Fri, 17 Sep 2021 19:02:35 +0800

When booting using U-boot -> systemd-boot (EFI payload) -> kernel on
RK3399, systemd-boot by some bug can't execute its "devicetree" key
correctly to load its proper dtb from files.

In that case, it will use fallback dtb from U-boot, which can be
out-of-date, and on RK3399, even the latest U-boot contains out-of-date
dtb which can cause problems like invalid opp tables.

And for systemd-boot, it doesn't provide any board specific dtb, but
completely relies on the EFI environment provided by U-boot, thus if we
can't find a good dtb, the fallback one will be used anyway.

So this patch will workaround the problem by appending common linux dtbs
directory to the existing "efi_dtb_prefixes" so that for systemd-boot it
can use the existing fdt and boot properly.


Why isn't the dtb installed in the standard location?


Isn't standard location "/dtbs" other than "/dtb"?

At least kernel puts its dtbs into "/dtbs" by default.



Slowing the boot process down because distro's can't agree where to
put the files is a bit lame...


Would it be better to make the search path configurable at config time?

By that each distro should config their dtbs search path, which could
further optimize the boot sequence by removing all other unnecessary
prefixes.

Thanks,
Qu




Signed-off-by: Qu Wenruo 
---
  include/config_distro_bootcmd.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/config_distro_bootcmd.h b/include/config_distro_bootcmd.h
index 2627c2a6a541..4ec87483eb65 100644
--- a/include/config_distro_bootcmd.h
+++ b/include/config_distro_bootcmd.h
@@ -151,7 +151,7 @@
"load ${devtype} ${devnum}:${distro_bootpart} "   \
"${fdt_addr_r} ${prefix}${efi_fdtfile}\0" \
\
-   "efi_dtb_prefixes=/ /dtb/ /dtb/current/\0"\
+   "efi_dtb_prefixes=/ /dtb/ /dtb/current/ /dtbs/ /dtbs/current/\0"  \
"scan_dev_for_efi="   \
"setenv efi_fdtfile ${fdtfile}; " \
BOOTENV_EFI_SET_FDTFILE_FALLBACK  \
--
2.33.0




Systemd-boot not properly loading device tree, when loaded by U-boot (ARM64, tested on RK3399)

2021-09-17 Thread Qu Wenruo

Hi,

I'm recently testing booting my RK3399 boards with the following boot
sequence:

U-boot -> systemd-boot (EFI payload) -> kernel

Which provides much more flex than plain extlinux conf from U-boot.
(More choice, easier to write config, runtime kernel change).

So far "kernel" and "initramfs" key work fine.

But I notice that "devicetree" key is not working properly.

The Uboot fdt search path doesn't include "/dtbs" which is used by my
distro, and my entry config specify the device-tree file like this:

titleManjaroARM boot from nvme
linux/Image
devicetree/dtbs/rockchip/rk3399-rockpro64.dtb
initrd/initramfs-linux.img
optionsconsole=ttyS2,150 root=/dev/arm_nvme/root rw loglevel=7

Thus if systemd-boot doesn't load the correct device-tree, kernel will
use the default fdt passed from Uboot, which is already out-of-date and
can cause problems for the upstream kernel I used.

Unfortunately, with above config, after booting the kernel, the fdt is
the fallback one from Uboot, not loading the proper one specified by
systemd-boot config.

The proof I went is checking the opp table.
I have replaced the "/dtbs/rockchip/rk3399-rockpro64.dtb" with a custom
dtb which uses op1 tables.
But the kernel only sees a very out-of-dated fdt, which some opp is even
invalid.

How could I continue debugging the missing link?
Like what systemd-boot needs to load the device-tree? Or U-boot EFI
environment lacks certain facility to support systemd-boot?

Thanks,
Qu



[PATCH] distro_bootcmd: change the default dtb search path to include default kernel dtbs directory

2021-09-17 Thread Qu Wenruo
When booting using U-boot -> systemd-boot (EFI payload) -> kernel on
RK3399, systemd-boot by some bug can't execute its "devicetree" key
correctly to load its proper dtb from files.

In that case, it will use fallback dtb from U-boot, which can be
out-of-date, and on RK3399, even the latest U-boot contains out-of-date
dtb which can cause problems like invalid opp tables.

And for systemd-boot, it doesn't provide any board specific dtb, but
completely relies on the EFI environment provided by U-boot, thus if we
can't find a good dtb, the fallback one will be used anyway.

So this patch will workaround the problem by appending common linux dtbs
directory to the existing "efi_dtb_prefixes" so that for systemd-boot it
can use the existing fdt and boot properly.

Signed-off-by: Qu Wenruo 
---
 include/config_distro_bootcmd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/config_distro_bootcmd.h b/include/config_distro_bootcmd.h
index 2627c2a6a541..4ec87483eb65 100644
--- a/include/config_distro_bootcmd.h
+++ b/include/config_distro_bootcmd.h
@@ -151,7 +151,7 @@
"load ${devtype} ${devnum}:${distro_bootpart} "   \
"${fdt_addr_r} ${prefix}${efi_fdtfile}\0" \
\
-   "efi_dtb_prefixes=/ /dtb/ /dtb/current/\0"\
+   "efi_dtb_prefixes=/ /dtb/ /dtb/current/ /dtbs/ /dtbs/current/\0"  \
"scan_dev_for_efi="   \
"setenv efi_fdtfile ${fdtfile}; " \
BOOTENV_EFI_SET_FDTFILE_FALLBACK  \
-- 
2.33.0



Way to change $scan_dev_for_boot search sequence?

2021-09-09 Thread Qu Wenruo

Hi,

Is there any way to change the $scan_dev_for_boot search sequence?

Currently it always go extlinux first, then scripts, and finally EFI.


Normally this is completely fine, as one system should only have one of 
these, thus search sequence makes no difference.



But for certain distros, like Manjaro ARM, may migrate from current 
extlinux config to UEFI payload (like GRUB or systemd-boot, to provide a 
better way to choose from different boot configs).



Thus current extlinux->scritps->EFI search priority makes it much harder 
to make EFI to be the default boot option.



It looks like config_distro_bootcmd.h is here to define the sequence, 
and I see no config option to modify that.


So I guess if we want to config the sequence, we have to modify that 
header, right?


Thanks,
Qu



Re: [PATCH] btrfs: Use default subvolume as filesystem root

2021-09-01 Thread Qu Wenruo




On 2021/9/1 下午9:48, Matthias Brugger wrote:



On 01/09/2021 13:36, Tom Rini wrote:

On Wed, Sep 01, 2021 at 01:28:30PM +0200, Matthias Brugger wrote:

Hi Tom,

On 02/08/2021 01:06, Qu Wenruo wrote:



On 2021/8/2 上午4:52, Matwey V. Kornilov wrote:

BTRFS volume consists of a number of subvolumes which can be mounted separately
from each other. The top-level subvolume always exists even if no subvolumes
were created manually. A subvolume can be denoted as the default subvolume i.e.
the subvolume which is mounted by default.

The default "default subvolume" is the top-level one, but this is far from the
common practices used in the wild. For instance, openSUSE provides an OS
snapshot/rollback feature based on BTRFS. To achieve this, the actual OS root
filesystem is located into a separate subvolume which is "default" but not
"top-level". That means that the /boot/dtb/ directory is also located inside
this default subvolume instead of top-level one.

However, the existing btrfs u-boot driver always uses the top-level subvolume
as the filesystem root. This behaviour 1) is inconsistent with

  mount /dev/sda1 /target

command, which mount the default subvolume 2) leads to the issues when
/boot/dtb cannot be found properly (see the reference).


I also noticed the problem in the past, but forgot to fix it



This patch uses the default subvolume as the filesystem root to overcome
mentioned issues.

Reference: https://bugzilla.suse.com/show_bug.cgi?id=1185656
Signed-off-by: Matwey V. Kornilov 


Reviewed-by: Qu Wenruo 



I can see that this patch is marked in your patchwork queue as "Need Review /
ACK". Qu is one of our core btrfs developer who reviewed the patch. Apart from
that we have it running on openSUSE on top of v2021.07 for some time without any
issues.


Ah, yup.  Qu is one of the people I do look for to have reviewed a btrfs
patch before I apply it (and I throw things under Need Review / ACK as a
note-to-self to make sure a patch does have one, when I can expect one,
before applying, FWIW).


Would it be possible to merge this for v2021.10 or do you see any blocker here?


I think I had mentally filed it was feature not bugfix and was going to
hold off, but since you're asking, yes, I can grab it for this release.
Thanks!



It's a bugfix, as loading files from BTRFS, at least in openSUSE does not work
without it.


Just to be extra accurate, the original code always accesses subvol 5 as 
the entrance point, thus for openSUSE or any other distros setting other 
default subvolume, this will cause behavior difference between kernel 
and u-boot, thus unable to find the correct /boot path.


This problem is cause by my rework using btrfs-progs code. All my fault.

Btrfs-progs never needs to bother the default subvolume, as it has no 
way to "mount" the fs nor explore the directories/files structure.


Thus it's a regression, and I'm not sure if we should use Fixes: tag.

Just in case, the fixes tag is:

Fixes: f06bfcf54d0e ("fs: btrfs: Crossport open_ctree_fs_info() from 
btrfs-progs")


Thank all guys involved in this case.
Qu



Thanks for the quick answer!
Matthias





Re: [PATCH 04/11] btrfs: Suppress the message about missing filesystem

2021-08-19 Thread Qu Wenruo




On 2021/8/19 上午11:40, Simon Glass wrote:

This message comes up a lot when scanning filesystems. It suggests to the
user that there is some sort of error, but in fact there is no reason to
expect that a particular partition has a btrfs filesystem. Other
filesystems don't print this error.

Turn it into a debug message.

Signed-off-by: Simon Glass 


Reviewed-by: Qu Wenruo 

Thanks,
Qu

---

  fs/btrfs/disk-io.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 349411c3ccd..1b69346e231 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -886,7 +886,11 @@ static int btrfs_scan_fs_devices(struct blk_desc *desc,

ret = btrfs_scan_one_device(desc, part, fs_devices, _devs);
if (ret) {
-   fprintf(stderr, "No valid Btrfs found\n");
+   /*
+* Avoid showing this when probing for a possible Btrfs
+*
+* fprintf(stderr, "No valid Btrfs found\n");
+*/
return ret;
}
return 0;
@@ -975,7 +979,7 @@ struct btrfs_fs_info *open_ctree_fs_info(struct blk_desc 
*desc,
disk_super = fs_info->super_copy;
ret = btrfs_read_dev_super(desc, part, disk_super);
if (ret) {
-   printk("No valid btrfs found\n");
+   debug("No valid btrfs found\n");
goto out_devices;
}




Re: [PATCH] btrfs: Use default subvolume as filesystem root

2021-08-01 Thread Qu Wenruo




On 2021/8/2 上午4:52, Matwey V. Kornilov wrote:

BTRFS volume consists of a number of subvolumes which can be mounted separately
from each other. The top-level subvolume always exists even if no subvolumes
were created manually. A subvolume can be denoted as the default subvolume i.e.
the subvolume which is mounted by default.

The default "default subvolume" is the top-level one, but this is far from the
common practices used in the wild. For instance, openSUSE provides an OS
snapshot/rollback feature based on BTRFS. To achieve this, the actual OS root
filesystem is located into a separate subvolume which is "default" but not
"top-level". That means that the /boot/dtb/ directory is also located inside
this default subvolume instead of top-level one.

However, the existing btrfs u-boot driver always uses the top-level subvolume
as the filesystem root. This behaviour 1) is inconsistent with

 mount /dev/sda1 /target

command, which mount the default subvolume 2) leads to the issues when
/boot/dtb cannot be found properly (see the reference).


I also noticed the problem in the past, but forgot to fix it



This patch uses the default subvolume as the filesystem root to overcome
mentioned issues.

Reference: https://bugzilla.suse.com/show_bug.cgi?id=1185656
Signed-off-by: Matwey V. Kornilov 


Reviewed-by: Qu Wenruo 

Thanks,
Qu


---
  fs/btrfs/disk-io.c | 38 +++---
  1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 349411c3cc..12f9579fcf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -804,6 +804,30 @@ static int setup_root_or_create_block(struct btrfs_fs_info 
*fs_info,
return 0;
  }
  
+static int get_default_subvolume(struct btrfs_fs_info *fs_info,

+struct btrfs_key *key_ret)
+{
+   struct btrfs_root *root = fs_info->tree_root;
+   struct btrfs_dir_item *dir_item;
+   struct btrfs_path path;
+   int ret = 0;
+
+   btrfs_init_path();
+
+   dir_item = btrfs_lookup_dir_item(NULL, root, ,
+BTRFS_ROOT_TREE_DIR_OBJECTID,
+"default", 7, 0);
+   if (IS_ERR(dir_item)) {
+   ret = PTR_ERR(dir_item);
+   goto out;
+   }
+
+   btrfs_dir_item_key_to_cpu(path.nodes[0], dir_item, key_ret);
+out:
+   btrfs_release_path();
+   return ret;
+}
+
  int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info)
  {
struct btrfs_super_block *sb = fs_info->super_copy;
@@ -833,9 +857,17 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info)
  
  	fs_info->last_trans_committed = generation;
  
-	key.objectid = BTRFS_FS_TREE_OBJECTID;

-   key.type = BTRFS_ROOT_ITEM_KEY;
-   key.offset = (u64)-1;
+   ret = get_default_subvolume(fs_info, );
+   if (ret) {
+   /*
+* The default dir item isn't there. Linux kernel behaviour is
+* to silently use the top-level subvolume in this case.
+*/
+   key.objectid = BTRFS_FS_TREE_OBJECTID;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   }
+
fs_info->fs_root = btrfs_read_fs_root(fs_info, );
  
  	if (IS_ERR(fs_info->fs_root))






Re: [PATCH 04/49] btrfs: Use U-Boot API for decompression

2021-05-03 Thread Qu Wenruo




On 2021/5/4 上午7:10, Simon Glass wrote:

Use the common function to avoid code duplication.

Signed-off-by: Simon Glass 


Acked-by: Qu Wenruo 

Thanks,
Qu

---

(no changes since v1)

  fs/btrfs/compression.c | 51 +-
  1 file changed, 5 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 23efefa1997..7adfbb04a7c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -6,6 +6,7 @@
   */

  #include "btrfs.h"
+#include 
  #include 
  #include 
  #include 
@@ -136,54 +137,12 @@ static u32 decompress_zlib(const u8 *_cbuf, u32 clen, u8 
*dbuf, u32 dlen)

  static u32 decompress_zstd(const u8 *cbuf, u32 clen, u8 *dbuf, u32 dlen)
  {
-   ZSTD_DStream *dstream;
-   ZSTD_inBuffer in_buf;
-   ZSTD_outBuffer out_buf;
-   void *workspace;
-   size_t wsize;
-   u32 res = -1;
-
-   wsize = ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT);
-   workspace = malloc(wsize);
-   if (!workspace) {
-   debug("%s: cannot allocate workspace of size %zu\n", __func__,
- wsize);
-   return -1;
-   }
-
-   dstream = ZSTD_initDStream(ZSTD_BTRFS_MAX_INPUT, workspace, wsize);
-   if (!dstream) {
-   printf("%s: ZSTD_initDStream failed\n", __func__);
-   goto err_free;
-   }
+   struct abuf in, out;

-   in_buf.src = cbuf;
-   in_buf.pos = 0;
-   in_buf.size = clen;
+   abuf_init_set(, (u8 *)cbuf, clen);
+   abuf_init_set(, dbuf, dlen);

-   out_buf.dst = dbuf;
-   out_buf.pos = 0;
-   out_buf.size = dlen;
-
-   while (1) {
-   size_t ret;
-
-   ret = ZSTD_decompressStream(dstream, _buf, _buf);
-   if (ZSTD_isError(ret)) {
-   printf("%s: ZSTD_decompressStream error %d\n", __func__,
-  ZSTD_getErrorCode(ret));
-   goto err_free;
-   }
-
-   if (in_buf.pos >= clen || !ret)
-   break;
-   }
-
-   res = out_buf.pos;
-
-err_free:
-   free(workspace);
-   return res;
+   return zstd_decompress(, );
  }

  u32 btrfs_decompress(u8 type, const char *c, u32 clen, char *d, u32 dlen)



[PATCH U-boot v2] fs: btrfs: fix the false alert of decompression failure

2021-04-17 Thread Qu Wenruo
There are some cases where decompressed sectors can have padding zeros.

In kernel code, we have lines to address such situation:

/*
 * btrfs_getblock is doing a zero on the tail of the page too,
 * but this will cover anything missing from the decompressed
 * data.
 */
if (bytes < destlen)
memset(kaddr+bytes, 0, destlen-bytes);
kunmap_local(kaddr);

But not in U-boot code, thus we have some reports of U-boot failed to
read compressed files in btrfs.

Fix it by doing the same thing of the kernel, for both inline and
regular compressed extents.

Reported-by: Matwey Kornilov 
Link: https://bugzilla.suse.com/show_bug.cgi?id=1183717
Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and 
btrfs_read_extent_reg()")
Signed-off-by: Qu Wenruo 
---
Changelog:
v2:
- Fix the bug for regular and inline compressed extents
---
 fs/btrfs/inode.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 019d532a1a4b..2c2379303d74 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -390,10 +390,16 @@ int btrfs_read_extent_inline(struct btrfs_path *path,
   csize);
ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi),
   cbuf, csize, dbuf, dsize);
-   if (ret < 0 || ret != dsize) {
+   if (ret == (u32)-1) {
ret = -EIO;
goto out;
}
+   /*
+* The compressed part ends before sector boundary, the remaining needs
+* to be zeroed out.
+*/
+   if (ret < dsize)
+   memset(dbuf + ret, 0, dsize - ret);
memcpy(dest, dbuf, dsize);
ret = dsize;
 out:
@@ -494,10 +500,16 @@ int btrfs_read_extent_reg(struct btrfs_path *path,
 
ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi), cbuf,
   csize, dbuf, dsize);
-   if (ret != dsize) {
+   if (ret == (u32)-1) {
ret = -EIO;
goto out;
}
+   /*
+* The compressed part ends before sector boundary, the remaining needs
+* to be zeroed out.
+*/
+   if (ret < dsize)
+   memset(dbuf + ret, 0, dsize - ret);
/* Then copy the needed part */
memcpy(dest, dbuf + btrfs_file_extent_offset(leaf, fi), len);
ret = len;
-- 
2.31.1



[PATCH UBoot] fs: btrfs: fix the false alert of decompression failure

2021-04-17 Thread Qu Wenruo
There are some cases where decompressed sectors can have padding zeros.

In kernel code, we have lines to address such situation:

/*
 * btrfs_getblock is doing a zero on the tail of the page too,
 * but this will cover anything missing from the decompressed
 * data.
 */
if (bytes < destlen)
memset(kaddr+bytes, 0, destlen-bytes);
kunmap_local(kaddr);

But not in U-boot code, thus we have some reports of U-boot failed to
read compressed files in btrfs.

Fix it by doing the same thing of the kernel.

Reported-by: Matwey Kornilov 
Link: https://bugzilla.suse.com/show_bug.cgi?id=1183717
Fixes: a26a6bedafcf ("fs: btrfs: Introduce btrfs_read_extent_inline() and 
btrfs_read_extent_reg()")
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/inode.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 019d532a1a4b..f780c53d5250 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -390,10 +390,16 @@ int btrfs_read_extent_inline(struct btrfs_path *path,
   csize);
ret = btrfs_decompress(btrfs_file_extent_compression(leaf, fi),
   cbuf, csize, dbuf, dsize);
-   if (ret < 0 || ret != dsize) {
+   if (ret == (u32)-1) {
ret = -EIO;
goto out;
}
+   /*
+* The compressed part ends before sector boundary, the remaining needs
+* to be zeroed out.
+*/
+   if (ret < dsize)
+   memset(dbuf + ret, 0, dsize - ret);
memcpy(dest, dbuf, dsize);
ret = dsize;
 out:
-- 
2.31.1



Re: [PATCH u-boot] fs: btrfs: do not fail when offset of a ROOT_ITEM is not -1

2021-02-10 Thread Qu Wenruo




On 2021/2/11 上午12:21, Marek Behun wrote:

On Wed, 10 Feb 2021 09:20:11 +0800
Qu Wenruo  wrote:


You're correct, the kernel is using new schema, btrfs_get_fs_root(),
which only requires root objectid and completely get rid of the
offset/type, which is far less possible to call with wrong parameters.

It would be a good timing to sync the code between kernel and
progs/u-boot now.


So do you agree with this patch? If so, can you add Reviewed-by? Thanks.


I mean, to change btrfs-progs interface to follow kernel
btrfs_get_fs_root() schema, just pass objectid, without the need for
btrfs_key.

Thanks,
Qu


Re: [PATCH u-boot] fs: btrfs: do not fail when offset of a ROOT_ITEM is not -1

2021-02-09 Thread Qu Wenruo




On 2021/2/10 上午9:05, Marek Behun wrote:

On Wed, 10 Feb 2021 08:09:14 +0800
Qu Wenruo  wrote:


On 2021/2/10 上午1:33, Marek Behún wrote:

When the btrfs_read_fs_root() function is searching a ROOT_ITEM with
location key offset other than -1, it currently fails via BUG_ON.

The offset can have other value than -1, though. This can happen for
example if a subvolume is renamed:

$ btrfs subvolume create X && sync
Create subvolume './X'
$ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: X$
  location key (270 ROOT_ITEM 18446744073709551615) type DIR
  transid 283 data_len 0 name_len 1
  name: X
$ mv X Y && sync
$ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: Y$
  location key (270 ROOT_ITEM 0) type DIR
  transid 285 data_len 0 name_len 1
  name: Y

As can be seen the offset changed from -1ULL to 0.



Offset for subvolume ROOT_ITEM can be other values, especially for
snapshot that offset is the transid when it get created.

But the problem is, if we call btrfs_read_fs_root() for subvolume tree,
the offset of the key really doesn't matter, the only important thing is
the objectid.

Thus we use that BUG_ON() to catch careless callers.

Would you please provide a case where we wrongly call
btrfs_read_fs_root() with incorrect offset inside btrfs-progs/uboot?

I believe that would be the proper way to fix.


Qu,

this can be triggered in U-Boot when listing a directory containing a
subvolume that was renamed:
   - create a subvolume && sync
   - rename subvolume && sync
   - umount, reboot, list the directory containing the subvolume in
 u-boot
It will also break when you want to read a file that has a subvolume in
it's path (e.g. `read mmc 0 0x1000 /renamed-subvol/file`).

I found out this btrfs-progs commit:
   
https://github.com/kdave/btrfs-progs/commit/10f1af0fe7de5a0310657993c7c21a1d78087e56
This commit ensures that while searching a directory recursively, when
a ROOT_ITEM is encountered, the offset of its location is changed to -1
before passing the location to btrfs_read_fs_root().


That's what I expect the code to do, but you're right, if kernel is not 
doing it anymore, I prefer the kernel behavior.




So maybe we could do this in u-boot as well, but why do this? Linux'
btrfs driver does not check whether the offset is -1. So why do it here?


You're correct, the kernel is using new schema, btrfs_get_fs_root(), 
which only requires root objectid and completely get rid of the 
offset/type, which is far less possible to call with wrong parameters.


It would be a good timing to sync the code between kernel and 
progs/u-boot now.




BTW, Qu, I think we have to change the BUG_ON code in U-Boot's btrfs
driver. BUG_ON in U-Boot calls a complete SOC reset. We can't break
whole U-Boot simply because btrfs partition contains broken data.
U-Boot commands must fail in such a case, not reset the SOC.


Well, progs (and even kernel) is a mine-field for BUG_ON()s.

But at least for kernel, it's protected by tree-checker which rejects 
invalid on-disk data before it reaches btrfs code, thus mostly kernel 
BUG_ON()s are really hard to hit (a lot of them are even impossible to 
hit after the introduction of tree-checker), and indicate real problems.


For now, the BUG_ON()s in U-boot still indicates problems that we can't 
really solve or doesn't expect at all in btrfs realm, e.g. the BUG_ON() 
you're hitting (call sites problem).


I admit it's a pain in the ass for full SoC reset, but I don't have any 
better alternatives yet.


The mid to long term solution would be introducing tree-checker to 
U-boot, so that the remaining BUG_ON()s are really code bugs.


Thanks,
Qu



Marek





Re: [PATCH u-boot 1/2] fs: btrfs: skip xattrs in directory listing

2021-02-09 Thread Qu Wenruo




On 2021/2/10 上午2:05, Marek Behún wrote:

Skip xattrs in directory listing. U-Boot filesystem drivers do not list
xattrs.

Signed-off-by: Marek Behún 
Cc: David Sterba 
Cc: Qu Wenruo 
Cc: Tom Rini 


Reviewed-by: Qu Wenruo 

Thanks,
Qu

---
  fs/btrfs/btrfs.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 346b2c4341..6b4c5feb53 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -29,7 +29,6 @@ static int show_dir(struct btrfs_root *root, struct 
extent_buffer *eb,
[BTRFS_FT_FIFO] = "FIFO",
[BTRFS_FT_SOCK] = "SOCK",
[BTRFS_FT_SYMLINK]  = "SYMLINK",
-   [BTRFS_FT_XATTR]= "XATTR"
};
u8 type = btrfs_dir_type(eb, di);
char namebuf[BTRFS_NAME_LEN];
@@ -38,6 +37,10 @@ static int show_dir(struct btrfs_root *root, struct 
extent_buffer *eb,
time_t mtime;
int ret = 0;

+   /* skip XATTRs in directory listing */
+   if (type == BTRFS_FT_XATTR)
+   return 0;
+
btrfs_dir_item_key_to_cpu(eb, di, );

if (key.type == BTRFS_ROOT_ITEM_KEY) {



Re: [PATCH u-boot 2/2] fs: btrfs: change directory list output to be aligned as before

2021-02-09 Thread Qu Wenruo




On 2021/2/10 上午2:05, Marek Behún wrote:

Since commit 325dd1f642dd ("fs: btrfs: Use btrfs_iter_dir() to ...")
when btrfs is listing a directory, the output is not aligned:

15  Wed Sep 09 13:20:03 2020  boot.scr -> @/boot/boot.scr
 0  Tue Feb 02 12:42:09 2021  @
   108  Tue Feb 02 12:54:04 2021  1.info

Return back to how it was displayed previously, i.e.:

15  Wed Sep 09 13:20:03 2020  boot.scr -> @/boot/boot.scr
 0  Tue Feb 02 12:42:09 2021  @
   <   >108  Tue Feb 02 12:54:04 2021  1.info

Instead of '', print '<   >', as ext4 driver.

If an unknown directory item type is encountered, we will print the type
number left padded with spaces, enclosed by '?', instead of '<' and '>',
i.e.:

   ? 30?.  name

Signed-off-by: Marek Behún 
Fixes: 325dd1f642dd ("fs: btrfs: Use btrfs_iter_dir() to replace ...")
Cc: David Sterba 
Cc: Qu Wenruo 
Cc: Tom Rini 


Reviewed-by: Qu Wenruo 



---
  fs/btrfs/btrfs.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/btrfs.c b/fs/btrfs/btrfs.c
index 6b4c5feb53..52a243a659 100644
--- a/fs/btrfs/btrfs.c
+++ b/fs/btrfs/btrfs.c
@@ -22,13 +22,13 @@ static int show_dir(struct btrfs_root *root, struct 
extent_buffer *eb,
struct btrfs_inode_item ii;
struct btrfs_key key;
static const char* dir_item_str[] = {
-   [BTRFS_FT_REG_FILE] = "FILE",
+   [BTRFS_FT_REG_FILE] = "   ",
[BTRFS_FT_DIR]  = "DIR",
-   [BTRFS_FT_CHRDEV]   = "CHRDEV",
-   [BTRFS_FT_BLKDEV]   = "BLKDEV",
-   [BTRFS_FT_FIFO] = "FIFO",
-   [BTRFS_FT_SOCK] = "SOCK",
-   [BTRFS_FT_SYMLINK]  = "SYMLINK",
+   [BTRFS_FT_CHRDEV]   = "CHR",
+   [BTRFS_FT_BLKDEV]   = "BLK",
+   [BTRFS_FT_FIFO] = "FIF",
+   [BTRFS_FT_SOCK] = "SCK",
+   [BTRFS_FT_SYMLINK]  = "SYM",


Since btrfs-progs also use similar output for its dump-tree, I guess
it's also possible to use the similar 3 chars output, except the FILE.

Thanks,
Qu

};
u8 type = btrfs_dir_type(eb, di);
char namebuf[BTRFS_NAME_LEN];
@@ -93,7 +93,7 @@ static int show_dir(struct btrfs_root *root, struct 
extent_buffer *eb,
if (type < ARRAY_SIZE(dir_item_str) && dir_item_str[type])
printf("<%s> ", dir_item_str[type]);
else
-   printf("DIR_ITEM.%u", type);
+   printf("?%3u? ", type);
if (type == BTRFS_FT_CHRDEV || type == BTRFS_FT_BLKDEV) {
ASSERT(key.type == BTRFS_INODE_ITEM_KEY);
printf("%4llu,%5llu  ", btrfs_stack_inode_rdev() >> 20,



Re: [PATCH u-boot] fs: btrfs: do not fail when offset of a ROOT_ITEM is not -1

2021-02-09 Thread Qu Wenruo




On 2021/2/10 上午1:33, Marek Behún wrote:

When the btrfs_read_fs_root() function is searching a ROOT_ITEM with
location key offset other than -1, it currently fails via BUG_ON.

The offset can have other value than -1, though. This can happen for
example if a subvolume is renamed:

   $ btrfs subvolume create X && sync
   Create subvolume './X'
   $ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: X$
 location key (270 ROOT_ITEM 18446744073709551615) type DIR
 transid 283 data_len 0 name_len 1
 name: X
   $ mv X Y && sync
   $ btrfs inspect-internal dump-tree /dev/root | grep -B 2 'name: Y$
 location key (270 ROOT_ITEM 0) type DIR
 transid 285 data_len 0 name_len 1
 name: Y

As can be seen the offset changed from -1ULL to 0.



Offset for subvolume ROOT_ITEM can be other values, especially for
snapshot that offset is the transid when it get created.

But the problem is, if we call btrfs_read_fs_root() for subvolume tree,
the offset of the key really doesn't matter, the only important thing is
the objectid.

Thus we use that BUG_ON() to catch careless callers.

Would you please provide a case where we wrongly call
btrfs_read_fs_root() with incorrect offset inside btrfs-progs/uboot?

I believe that would be the proper way to fix.

Thanks,
Qu


Do not fail in this case.

Signed-off-by: Marek Behún 
Cc: David Sterba 
Cc: Qu Wenruo 
Cc: Tom Rini 
---
  fs/btrfs/disk-io.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b332ecb796..c6fdec95c1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -732,8 +732,7 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_fs_info 
*fs_info,
return fs_info->chunk_root;
if (location->objectid == BTRFS_CSUM_TREE_OBJECTID)
return fs_info->csum_root;
-   BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID ||
-  location->offset != (u64)-1);
+   BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID);

node = rb_search(_info->fs_root_tree, (void *),
 btrfs_fs_roots_compare_objectids, NULL);



Re: [PATCH] fs: btrfs: Select SHA256 in Kconfig

2021-01-27 Thread Qu Wenruo




On 2021/1/27 下午8:01, David Sterba wrote:

On Wed, Jan 27, 2021 at 10:42:30AM +0100, matthias@kernel.org wrote:

From: Matthias Brugger 

Since commit 565a4147d17a ("fs: btrfs: Add more checksum algorithms")
btrfs uses the sha256 checksum algorithm. But Kconfig lacks to select
it. This leads to compilation errors:
fs/built-in.o: In function `hash_sha256':
fs/btrfs/crypto/hash.c:25: undefined reference to `sha256_starts'
fs/btrfs/crypto/hash.c:26: undefined reference to `sha256_update'
fs/btrfs/crypto/hash.c:27: undefined reference to `sha256_finish'

Signed-off-by: Matthias Brugger 


So this is a fix for u-boot, got me confused and not for the first time
as there's Kconfig and the same fs/btrfs/ directory structure.


Well, sometimes too unified file structure/code base can also be a problem.

Considering I'm also going to continue cross-porting more code to 
U-boot, any recommendation on this?

Using different prefix?

Thanks,
Qu



Re: [PATCH] fs: btrfs: Select SHA256 in Kconfig

2021-01-27 Thread Qu Wenruo




On 2021/1/27 下午5:42, matthias@kernel.org wrote:

From: Matthias Brugger 

Since commit 565a4147d17a ("fs: btrfs: Add more checksum algorithms")
btrfs uses the sha256 checksum algorithm. But Kconfig lacks to select
it. This leads to compilation errors:
fs/built-in.o: In function `hash_sha256':
fs/btrfs/crypto/hash.c:25: undefined reference to `sha256_starts'
fs/btrfs/crypto/hash.c:26: undefined reference to `sha256_update'
fs/btrfs/crypto/hash.c:27: undefined reference to `sha256_finish'

Signed-off-by: Matthias Brugger 


Oh, forgot to select sha256.

Thanks for the fix.

Reviewed-by: Qu Wenruo 

Thanks,
Qu


---

  fs/btrfs/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index f302b1fbef..2a32f42ad1 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -4,6 +4,7 @@ config FS_BTRFS
select LZO
select ZSTD
select RBTREE
+   select SHA256
help
  This provides a single-device read-only BTRFS support. BTRFS is a
  next-generation Linux file system based on the copy-on-write



Re: [PATCH 1/1] fs: btrfs: simplify close_ctree_fs_info()

2020-12-25 Thread Qu Wenruo




On 2020/12/25 下午8:45, Heinrich Schuchardt wrote:

At the beginning of close_ctree_fs_info() the value 0 is assigned to err
and never changed before testing it.

Let's get rid of the superfluous variable.

Fixes: f06bfcf54d0e ("fs: btrfs: Crossport open_ctree_fs_info() from 
btrfs-progs")


This is because current btrfs implementation of U-boot only support RO 
mount.


In btrfs-progs, since we support RW support, we will try to commit 
transaction and we can hit error during transaction commit.
Thus we use @ret to record current error and @error to record the final 
return value.


I don't believe U-boot would support btrfs read-write any time soon, 
thus the cleanup should be OK.


Reviewed-by: Qu Wenruo 

Thanks,
Qu


Signed-off-by: Heinrich Schuchardt 
---
  fs/btrfs/disk-io.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 01e7cee520..b332ecb796 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1030,7 +1030,6 @@ out:
  int close_ctree_fs_info(struct btrfs_fs_info *fs_info)
  {
int ret;
-   int err = 0;

free_fs_roots_tree(_info->fs_root_tree);

@@ -1038,9 +1037,7 @@ int close_ctree_fs_info(struct btrfs_fs_info *fs_info)
ret = btrfs_close_devices(fs_info->fs_devices);
btrfs_cleanup_all_caches(fs_info);
btrfs_free_fs_info(fs_info);
-   if (!err)
-   err = ret;
-   return err;
+   return ret;
  }

  int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid)
--
2.29.2





Re: [PATCH 4/4] fs: btrfs: initialize @ii in show_dir() to make coverity happy

2020-11-01 Thread Qu Wenruo



On 2020/11/2 下午3:24, Marek Behun wrote:
> On Mon, 2 Nov 2020 08:27:16 +0800
> Qu Wenruo  wrote:
> 
>> Thus I really tend to believe it's just a bug in coverity.
>> All locations accessing @ii all have its key.type checked to ensure it get 
>> filled in the first place.
> 
> If this is a bug in coverity, we should fix coverity, not add extra
> code to U-Boot, even if it is just one instruction. That simply stinks
> the same way like when systemd crashed when "debug" parameter was
> present in /proc/cmdline, and they sent a patch to kernel which removed
> the "debug" parameter, instead of fixing systemd. IMO.

Yep, you're right.

Please discard this patch.

Thanks,
Qu

> 
> Marek
> 



Re: [PATCH 2/4] fs: btrfs: volumes: prevent overflow for multiplying

2020-11-01 Thread Qu Wenruo


On 2020/11/2 上午8:20, Qu Wenruo wrote:
> 
> 
> On 2020/11/2 上午7:02, Marek Behun wrote:
>> On Sat, 31 Oct 2020 09:07:50 +0800
>> Qu Wenruo  wrote:
>>
>>> In real world, this should not cause problem as we have device number
>>> limit thus it won't go beyond 4G for a single stripe.
>>
>> So we can't run into this overflow in U-Boot because only one device is
>> supported? But in Linux we can run into this issue?
>>
> In kernel, we have device number check, IIRC for default nodesize is
> just several donzens of devices.
> And since each stripe is only 64K fixed in btrfs, you can see it won't
> go beyond 4G even in kernel.

Sorry, the 64K stripe_len is only for RAID56, and for kernel, we always
use u64 for stripe_len, thus we won't hit the problem.

The problem is in btrfs-progs, where the code comes from, we use int,
other than u64.

Thanks for the hint for the root cause, would related values to be u64
for both btrfs-progs and u-boot to follow the kernel scheme.

Thanks,
Qu
> 
> Thanks,
> Qu
> 



signature.asc
Description: OpenPGP digital signature


  1   2   3   >