Re: BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Sachi King
On Monday 22 April 2024 5:15:50 PM AEST Qu Wenruo wrote:
> 
> 在 2024/4/22 16:07, Sachi King 写道:
> > Hi,
> >
> > I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
> > it's a bug and not a corruption issue.
> >
> > A bit of history on the filesystem.  It is a fairly new filesystem as it was
> > being used to give me access to test a wayland application on a
> > Raspberry Pi.  The filesystem was about 3 days old when I hit the bug, and
> > I'm fairly certain it never had an unclean shutdown.  I have checked the
> > filesystem with "btrfs check" which has found no errors.  The filesystem
> > mounts on Linux and is functional.
> >
> >> # btrfs check --check-data-csum /dev/sda2
> >> Opening filesystem to check...
> >> Checking filesystem on /dev/sda2
> >> UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
> >> [1/7] checking root items
> >> [2/7] checking extents
> >> [3/7] checking free space tree
> >> [4/7] checking fs roots
> >> [5/7] checking csums against data
> >> [6/7] checking root refs
> >> [7/7] checking quota groups skipped (not enabled on this FS)
> >> found 5070573568 bytes used, no error found
> >> total csum bytes: 4451620
> >> total tree bytes: 370458624
> >> total fs tree bytes: 353124352
> >> total extent tree bytes: 10010624
> >> btree space waste bytes: 62303284
> >> file data blocks allocated: 6786519040
> >>   referenced 6328619008
> 
> Since btrfs check reports no error, the fs must be valid.
> 
> But considering how new it is, it may be related to some new features
> not properly implemented in Uboot.
> 
> Is it possible to provide the whole binary dump of the fs?

Sure, here's a copy of the disk image.  The BTRFS is at partition 2.
http://maribel.nakato.io/u-boot-newtest.img.zstd

> >
> >
> > I've made an image of the filesystem so I could reproduce the bug in an
> > environment that doesn't require the physical SBC, and have reproduced
> > the issue using the head of the master branch with "qemu-x86_64_defconfig".
> >
> > My testing qemu was produced with the following:
> >> # make qemu-x86_64_defconfig
> >> # cat << EOF >> .config
> >> CONFIG_AUTOBOOT=y
> >> CONFIG_BOOTDELAY=1
> >> CONFIG_USE_BOOTCOMMAND=y
> >> CONFIG_BOOTSTD_DEFAULTS=y
> >> CONFIG_BOOTSTD_FULL=y
> >> CONFIG_CMD_BOOTFLOW_FULL=y
> >> CONFIG_BOOTCOMMAND="bootflow scan -lb"
> >> CONFIG_ENV_IS_NOWHERE=y
> >> CONFIG_LZ4=y
> >> CONFIG_BZIP2=y
> >> CONFIG_ZSTD=y
> >> CONFIG_FS_BTRFS=y
> >> CONFIG_CMD_BTRFS=y
> >> CONFIG_GZIP=y
> >> CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
> >> EOF
> >> # make -j24
> >
> > bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:
> >> / {
> >>  bootstd {
> >>  compatible = "u-boot,boot-std";
> >>  filename-prefixes = "/@boot/", "/boot/", "/";
> >>  bootdev-order = "scsi";
> >>  extlinux {
> >>  compatible = "u-boot,extlinux";
> >>  };
> >>  };
> >> };
> >
> >
> > The VM was run with
> >> qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action 
> >> reboot=shutdown -drive file=/mnt/dbg/u-boot-debug.img
> >
> > The error message I recive on boot is
> >> BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
> >> BUG!
> >> resetting ...
> >
> >
> > I added a print statement to free_extent_buffer_internal that prints the
> > start address of the extent_buffer as I'm not sure what to be looking for
> > here.  This print statement is before the decrement.
> >> printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", 
> >> eb->start, eb->refs);
> >
> > The last message before the crash reported eb->start to be "0", with 0 refs.
> >> free_extent_buffer_internal: eb->start[0] eb->refs[0]
> >
> > The extent at 0 struck me as odd, so I tried commenting out the freeing, by
> > removing the call to free_extent_buffer_final, and this resulted in bootflow
> > succeeding and showing me the boot menu, which suprised me.
> > I expected to see the bug reproduce itself, with refs being zero, but 
> > eb->start
> > pointi

BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Sachi King
Hi,

I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
it's a bug and not a corruption issue.

A bit of history on the filesystem.  It is a fairly new filesystem as it was
being used to give me access to test a wayland application on a
Raspberry Pi.  The filesystem was about 3 days old when I hit the bug, and
I'm fairly certain it never had an unclean shutdown.  I have checked the
filesystem with "btrfs check" which has found no errors.  The filesystem
mounts on Linux and is functional.

> # btrfs check --check-data-csum /dev/sda2
> Opening filesystem to check...
> Checking filesystem on /dev/sda2
> UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space tree
> [4/7] checking fs roots
> [5/7] checking csums against data
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 5070573568 bytes used, no error found
> total csum bytes: 4451620
> total tree bytes: 370458624
> total fs tree bytes: 353124352
> total extent tree bytes: 10010624
> btree space waste bytes: 62303284
> file data blocks allocated: 6786519040
>  referenced 6328619008


I've made an image of the filesystem so I could reproduce the bug in an
environment that doesn't require the physical SBC, and have reproduced
the issue using the head of the master branch with "qemu-x86_64_defconfig".

My testing qemu was produced with the following:
> # make qemu-x86_64_defconfig
> # cat << EOF >> .config
> CONFIG_AUTOBOOT=y
> CONFIG_BOOTDELAY=1
> CONFIG_USE_BOOTCOMMAND=y
> CONFIG_BOOTSTD_DEFAULTS=y
> CONFIG_BOOTSTD_FULL=y
> CONFIG_CMD_BOOTFLOW_FULL=y
> CONFIG_BOOTCOMMAND="bootflow scan -lb"
> CONFIG_ENV_IS_NOWHERE=y
> CONFIG_LZ4=y
> CONFIG_BZIP2=y
> CONFIG_ZSTD=y
> CONFIG_FS_BTRFS=y
> CONFIG_CMD_BTRFS=y
> CONFIG_GZIP=y
> CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
> EOF
> # make -j24

bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:
> / {
> bootstd {
> compatible = "u-boot,boot-std";
> filename-prefixes = "/@boot/", "/boot/", "/";
> bootdev-order = "scsi";
> extlinux {
> compatible = "u-boot,extlinux";
> };
> };
> };


The VM was run with
> qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action reboot=shutdown 
> -drive file=/mnt/dbg/u-boot-debug.img

The error message I recive on boot is
> BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
> BUG!
> resetting ...


I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here.  This print statement is before the decrement.
> printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", 
> eb->start, eb->refs);

The last message before the crash reported eb->start to be "0", with 0 refs.
> free_extent_buffer_internal: eb->start[0] eb->refs[0]

The extent at 0 struck me as odd, so I tried commenting out the freeing, by
removing the call to free_extent_buffer_final, and this resulted in bootflow
succeeding and showing me the boot menu, which suprised me.
I expected to see the bug reproduce itself, with refs being zero, but eb->start
pointing somewhere valid, but I instead got a valid address with refs at 2.

I'm assuming that the order free_extent_buffer_internal is called is
deterministic, so by counting the print outputs the line that prior held
the extent_buffer with a 0 start was replaced with:
> free_extent_buffer_internal: eb->start[249c000] eb->refs[2]

Interestingly, as can be seen in the full logs with my included print
messages, 249c000 is being used just before this, with a ref count of
2.  249c000 does appear to reach a point where it should have been freed
in the past, before it gets used again as seen in both logs.

The failing boot log:
> U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
> Trying to boot from SPI
> Jumping to 64-bit U-Boot: Note many features are missing
> 
> 
> U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
> 
> CPU:   QEMU Virtual CPU version 2.5+
> DRAM:  128 MiB
> Core:  13 devices, 13 uclasses, devicetree: separate
> Loading Environment from nowhere... OK
> Model: QEMU x86 (I440FX)
> Net:   e1000: 00:00:00:00:00:00
>
> Error: e1000#0 No valid MAC address found.
>   eth_initialize() No ethernet found.
> 
> 
> Hit any key to stop autoboot:  0 
> Scanning for bootflows in all bootdevs
> Seq  Method   State   UclassPart  Name  Filename
> ---  ---  --        
> 
> scanning bus for devices...
> Target spinup took 0 ms.
> SATA link 1 timeout.
> Target spinup took 0 ms.
> SATA link 3 timeout.
> SATA link 4 timeout.
> SATA link 5 timeout.
> AHCI 0001. 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode
> flags: 64bit ncq only 
>