Hi,
I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
it's a bug and not a corruption issue.
A bit of history on the filesystem. It is a fairly new filesystem as it was
being used to give me access to test a wayland application on a
Raspberry Pi. The filesystem was about 3 days old when I hit the bug, and
I'm fairly certain it never had an unclean shutdown. I have checked the
filesystem with "btrfs check" which has found no errors. The filesystem
mounts on Linux and is functional.
> # btrfs check --check-data-csum /dev/sda2
> Opening filesystem to check...
> Checking filesystem on /dev/sda2
> UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space tree
> [4/7] checking fs roots
> [5/7] checking csums against data
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 5070573568 bytes used, no error found
> total csum bytes: 4451620
> total tree bytes: 370458624
> total fs tree bytes: 353124352
> total extent tree bytes: 10010624
> btree space waste bytes: 62303284
> file data blocks allocated: 6786519040
> referenced 6328619008
I've made an image of the filesystem so I could reproduce the bug in an
environment that doesn't require the physical SBC, and have reproduced
the issue using the head of the master branch with "qemu-x86_64_defconfig".
My testing qemu was produced with the following:
> # make qemu-x86_64_defconfig
> # cat << EOF >> .config
> CONFIG_AUTOBOOT=y
> CONFIG_BOOTDELAY=1
> CONFIG_USE_BOOTCOMMAND=y
> CONFIG_BOOTSTD_DEFAULTS=y
> CONFIG_BOOTSTD_FULL=y
> CONFIG_CMD_BOOTFLOW_FULL=y
> CONFIG_BOOTCOMMAND="bootflow scan -lb"
> CONFIG_ENV_IS_NOWHERE=y
> CONFIG_LZ4=y
> CONFIG_BZIP2=y
> CONFIG_ZSTD=y
> CONFIG_FS_BTRFS=y
> CONFIG_CMD_BTRFS=y
> CONFIG_GZIP=y
> CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
> EOF
> # make -j24
bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:
> / {
> bootstd {
> compatible = "u-boot,boot-std";
> filename-prefixes = "/@boot/", "/boot/", "/";
> bootdev-order = "scsi";
> extlinux {
> compatible = "u-boot,extlinux";
> };
> };
> };
The VM was run with
> qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action reboot=shutdown
> -drive file=/mnt/dbg/u-boot-debug.img
The error message I recive on boot is
> BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
> BUG!
> resetting ...
I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here. This print statement is before the decrement.
> printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n",
> eb->start, eb->refs);
The last message before the crash reported eb->start to be "0", with 0 refs.
> free_extent_buffer_internal: eb->start[0] eb->refs[0]
The extent at 0 struck me as odd, so I tried commenting out the freeing, by
removing the call to free_extent_buffer_final, and this resulted in bootflow
succeeding and showing me the boot menu, which suprised me.
I expected to see the bug reproduce itself, with refs being zero, but eb->start
pointing somewhere valid, but I instead got a valid address with refs at 2.
I'm assuming that the order free_extent_buffer_internal is called is
deterministic, so by counting the print outputs the line that prior held
the extent_buffer with a 0 start was replaced with:
> free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
Interestingly, as can be seen in the full logs with my included print
messages, 249c000 is being used just before this, with a ref count of
2. 249c000 does appear to reach a point where it should have been freed
in the past, before it gets used again as seen in both logs.
The failing boot log:
> U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
> Trying to boot from SPI
> Jumping to 64-bit U-Boot: Note many features are missing
>
>
> U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
>
> CPU: QEMU Virtual CPU version 2.5+
> DRAM: 128 MiB
> Core: 13 devices, 13 uclasses, devicetree: separate
> Loading Environment from nowhere... OK
> Model: QEMU x86 (I440FX)
> Net: e1000: 00:00:00:00:00:00
>
> Error: e1000#0 No valid MAC address found.
> eth_initialize() No ethernet found.
>
>
> Hit any key to stop autoboot: 0
> Scanning for bootflows in all bootdevs
> Seq Method State UclassPart Name Filename
> --- --- --
>
> scanning bus for devices...
> Target spinup took 0 ms.
> SATA link 1 timeout.
> Target spinup took 0 ms.
> SATA link 3 timeout.
> SATA link 4 timeout.
> SATA link 5 timeout.
> AHCI 0001. 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode
> flags: 64bit ncq only
>