Re: BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Sachi King
On Monday 22 April 2024 5:15:50 PM AEST Qu Wenruo wrote:
> 
> 在 2024/4/22 16:07, Sachi King 写道:
> > Hi,
> >
> > I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
> > it's a bug and not a corruption issue.
> >
> > A bit of history on the filesystem.  It is a fairly new filesystem as it was
> > being used to give me access to test a wayland application on a
> > Raspberry Pi.  The filesystem was about 3 days old when I hit the bug, and
> > I'm fairly certain it never had an unclean shutdown.  I have checked the
> > filesystem with "btrfs check" which has found no errors.  The filesystem
> > mounts on Linux and is functional.
> >
> >> # btrfs check --check-data-csum /dev/sda2
> >> Opening filesystem to check...
> >> Checking filesystem on /dev/sda2
> >> UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
> >> [1/7] checking root items
> >> [2/7] checking extents
> >> [3/7] checking free space tree
> >> [4/7] checking fs roots
> >> [5/7] checking csums against data
> >> [6/7] checking root refs
> >> [7/7] checking quota groups skipped (not enabled on this FS)
> >> found 5070573568 bytes used, no error found
> >> total csum bytes: 4451620
> >> total tree bytes: 370458624
> >> total fs tree bytes: 353124352
> >> total extent tree bytes: 10010624
> >> btree space waste bytes: 62303284
> >> file data blocks allocated: 6786519040
> >>   referenced 6328619008
> 
> Since btrfs check reports no error, the fs must be valid.
> 
> But considering how new it is, it may be related to some new features
> not properly implemented in Uboot.
> 
> Is it possible to provide the whole binary dump of the fs?

Sure, here's a copy of the disk image.  The BTRFS is at partition 2.
http://maribel.nakato.io/u-boot-newtest.img.zstd

> >
> >
> > I've made an image of the filesystem so I could reproduce the bug in an
> > environment that doesn't require the physical SBC, and have reproduced
> > the issue using the head of the master branch with "qemu-x86_64_defconfig".
> >
> > My testing qemu was produced with the following:
> >> # make qemu-x86_64_defconfig
> >> # cat << EOF >> .config
> >> CONFIG_AUTOBOOT=y
> >> CONFIG_BOOTDELAY=1
> >> CONFIG_USE_BOOTCOMMAND=y
> >> CONFIG_BOOTSTD_DEFAULTS=y
> >> CONFIG_BOOTSTD_FULL=y
> >> CONFIG_CMD_BOOTFLOW_FULL=y
> >> CONFIG_BOOTCOMMAND="bootflow scan -lb"
> >> CONFIG_ENV_IS_NOWHERE=y
> >> CONFIG_LZ4=y
> >> CONFIG_BZIP2=y
> >> CONFIG_ZSTD=y
> >> CONFIG_FS_BTRFS=y
> >> CONFIG_CMD_BTRFS=y
> >> CONFIG_GZIP=y
> >> CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
> >> EOF
> >> # make -j24
> >
> > bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:
> >> / {
> >>  bootstd {
> >>  compatible = "u-boot,boot-std";
> >>  filename-prefixes = "/@boot/", "/boot/", "/";
> >>  bootdev-order = "scsi";
> >>  extlinux {
> >>  compatible = "u-boot,extlinux";
> >>  };
> >>  };
> >> };
> >
> >
> > The VM was run with
> >> qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action 
> >> reboot=shutdown -drive file=/mnt/dbg/u-boot-debug.img
> >
> > The error message I recive on boot is
> >> BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
> >> BUG!
> >> resetting ...
> >
> >
> > I added a print statement to free_extent_buffer_internal that prints the
> > start address of the extent_buffer as I'm not sure what to be looking for
> > here.  This print statement is before the decrement.
> >> printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", 
> >> eb->start, eb->refs);
> >
> > The last message before the crash reported eb->start to be "0", with 0 refs.
> >> free_extent_buffer_internal: eb->start[0] eb->refs[0]
> >
> > The extent at 0 struck me as odd, so I tried commenting out the freeing, by
> > removing the call to free_extent_buffer_final, and this resulted in bootflow
> > succeeding and showing me the boot menu, which suprised me.
> > I expected to see the bug reproduce itself, with refs being zero, but 
> > eb->start
> > pointing somewhere valid, but I instead got a valid address with refs at 2.
> >
> > I'm assuming that the order free_extent_buffer_internal is called is
> > deterministic, so by counting the print outputs the line that prior held
> > the extent_buffer with a 0 start was replaced with:
> >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
> >
> > Interestingly, as can be seen in the full logs with my included print
> > messages, 249c000 is being used just before this, with a ref count of
> > 2.  249c000 does appear to reach a point where it should have been freed
> > in the past, before it gets used again as seen in both logs.
> >
> > The failing boot log:
> >> U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
> >> Trying to boot from SPI
> >> Jumping to 64-bit U-Boot: Note many features are missing
> >>
> >>
> >> U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
> >>
> >> CPU: 

BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Sachi King
Hi,

I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
it's a bug and not a corruption issue.

A bit of history on the filesystem.  It is a fairly new filesystem as it was
being used to give me access to test a wayland application on a
Raspberry Pi.  The filesystem was about 3 days old when I hit the bug, and
I'm fairly certain it never had an unclean shutdown.  I have checked the
filesystem with "btrfs check" which has found no errors.  The filesystem
mounts on Linux and is functional.

> # btrfs check --check-data-csum /dev/sda2
> Opening filesystem to check...
> Checking filesystem on /dev/sda2
> UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space tree
> [4/7] checking fs roots
> [5/7] checking csums against data
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 5070573568 bytes used, no error found
> total csum bytes: 4451620
> total tree bytes: 370458624
> total fs tree bytes: 353124352
> total extent tree bytes: 10010624
> btree space waste bytes: 62303284
> file data blocks allocated: 6786519040
>  referenced 6328619008


I've made an image of the filesystem so I could reproduce the bug in an
environment that doesn't require the physical SBC, and have reproduced
the issue using the head of the master branch with "qemu-x86_64_defconfig".

My testing qemu was produced with the following:
> # make qemu-x86_64_defconfig
> # cat << EOF >> .config
> CONFIG_AUTOBOOT=y
> CONFIG_BOOTDELAY=1
> CONFIG_USE_BOOTCOMMAND=y
> CONFIG_BOOTSTD_DEFAULTS=y
> CONFIG_BOOTSTD_FULL=y
> CONFIG_CMD_BOOTFLOW_FULL=y
> CONFIG_BOOTCOMMAND="bootflow scan -lb"
> CONFIG_ENV_IS_NOWHERE=y
> CONFIG_LZ4=y
> CONFIG_BZIP2=y
> CONFIG_ZSTD=y
> CONFIG_FS_BTRFS=y
> CONFIG_CMD_BTRFS=y
> CONFIG_GZIP=y
> CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
> EOF
> # make -j24

bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:
> / {
> bootstd {
> compatible = "u-boot,boot-std";
> filename-prefixes = "/@boot/", "/boot/", "/";
> bootdev-order = "scsi";
> extlinux {
> compatible = "u-boot,extlinux";
> };
> };
> };


The VM was run with
> qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action reboot=shutdown 
> -drive file=/mnt/dbg/u-boot-debug.img

The error message I recive on boot is
> BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
> BUG!
> resetting ...


I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here.  This print statement is before the decrement.
> printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", 
> eb->start, eb->refs);

The last message before the crash reported eb->start to be "0", with 0 refs.
> free_extent_buffer_internal: eb->start[0] eb->refs[0]

The extent at 0 struck me as odd, so I tried commenting out the freeing, by
removing the call to free_extent_buffer_final, and this resulted in bootflow
succeeding and showing me the boot menu, which suprised me.
I expected to see the bug reproduce itself, with refs being zero, but eb->start
pointing somewhere valid, but I instead got a valid address with refs at 2.

I'm assuming that the order free_extent_buffer_internal is called is
deterministic, so by counting the print outputs the line that prior held
the extent_buffer with a 0 start was replaced with:
> free_extent_buffer_internal: eb->start[249c000] eb->refs[2]

Interestingly, as can be seen in the full logs with my included print
messages, 249c000 is being used just before this, with a ref count of
2.  249c000 does appear to reach a point where it should have been freed
in the past, before it gets used again as seen in both logs.

The failing boot log:
> U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
> Trying to boot from SPI
> Jumping to 64-bit U-Boot: Note many features are missing
> 
> 
> U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
> 
> CPU:   QEMU Virtual CPU version 2.5+
> DRAM:  128 MiB
> Core:  13 devices, 13 uclasses, devicetree: separate
> Loading Environment from nowhere... OK
> Model: QEMU x86 (I440FX)
> Net:   e1000: 00:00:00:00:00:00
>
> Error: e1000#0 No valid MAC address found.
>   eth_initialize() No ethernet found.
> 
> 
> Hit any key to stop autoboot:  0 
> Scanning for bootflows in all bootdevs
> Seq  Method   State   UclassPart  Name  Filename
> ---  ---  --        
> 
> scanning bus for devices...
> Target spinup took 0 ms.
> SATA link 1 timeout.
> Target spinup took 0 ms.
> SATA link 3 timeout.
> SATA link 4 timeout.
> SATA link 5 timeout.
> AHCI 0001. 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode
> flags: 64bit ncq only 
> 

Re: BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Qu Wenruo




在 2024/4/22 16:45, Qu Wenruo 写道:
[...]


I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here.  This print statement is before the decrement.

printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n",
eb->start, eb->refs);




Just a small advice, in fact you can go with sandbox mode, running
U-boot in userspace, and bind a host file as a device to test the
filesystem code.

At least that's what I did for most U-boot bugs.

Thanks,
Qu


Re: BTRFS use-after-free bug at free_extent_buffer_internal

2024-04-22 Thread Qu Wenruo




在 2024/4/22 16:07, Sachi King 写道:

Hi,

I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
it's a bug and not a corruption issue.

A bit of history on the filesystem.  It is a fairly new filesystem as it was
being used to give me access to test a wayland application on a
Raspberry Pi.  The filesystem was about 3 days old when I hit the bug, and
I'm fairly certain it never had an unclean shutdown.  I have checked the
filesystem with "btrfs check" which has found no errors.  The filesystem
mounts on Linux and is functional.


# btrfs check --check-data-csum /dev/sda2
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking csums against data
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 5070573568 bytes used, no error found
total csum bytes: 4451620
total tree bytes: 370458624
total fs tree bytes: 353124352
total extent tree bytes: 10010624
btree space waste bytes: 62303284
file data blocks allocated: 6786519040
  referenced 6328619008


Since btrfs check reports no error, the fs must be valid.

But considering how new it is, it may be related to some new features
not properly implemented in Uboot.

Is it possible to provide the whole binary dump of the fs?




I've made an image of the filesystem so I could reproduce the bug in an
environment that doesn't require the physical SBC, and have reproduced
the issue using the head of the master branch with "qemu-x86_64_defconfig".

My testing qemu was produced with the following:

# make qemu-x86_64_defconfig
# cat << EOF >> .config
CONFIG_AUTOBOOT=y
CONFIG_BOOTDELAY=1
CONFIG_USE_BOOTCOMMAND=y
CONFIG_BOOTSTD_DEFAULTS=y
CONFIG_BOOTSTD_FULL=y
CONFIG_CMD_BOOTFLOW_FULL=y
CONFIG_BOOTCOMMAND="bootflow scan -lb"
CONFIG_ENV_IS_NOWHERE=y
CONFIG_LZ4=y
CONFIG_BZIP2=y
CONFIG_ZSTD=y
CONFIG_FS_BTRFS=y
CONFIG_CMD_BTRFS=y
CONFIG_GZIP=y
CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
EOF
# make -j24


bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:

/ {
 bootstd {
 compatible = "u-boot,boot-std";
 filename-prefixes = "/@boot/", "/boot/", "/";
 bootdev-order = "scsi";
 extlinux {
 compatible = "u-boot,extlinux";
 };
 };
};



The VM was run with

qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action reboot=shutdown 
-drive file=/mnt/dbg/u-boot-debug.img


The error message I recive on boot is

BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
BUG!
resetting ...



I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here.  This print statement is before the decrement.

printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", eb->start, 
eb->refs);


The last message before the crash reported eb->start to be "0", with 0 refs.

free_extent_buffer_internal: eb->start[0] eb->refs[0]


The extent at 0 struck me as odd, so I tried commenting out the freeing, by
removing the call to free_extent_buffer_final, and this resulted in bootflow
succeeding and showing me the boot menu, which suprised me.
I expected to see the bug reproduce itself, with refs being zero, but eb->start
pointing somewhere valid, but I instead got a valid address with refs at 2.

I'm assuming that the order free_extent_buffer_internal is called is
deterministic, so by counting the print outputs the line that prior held
the extent_buffer with a 0 start was replaced with:

free_extent_buffer_internal: eb->start[249c000] eb->refs[2]


Interestingly, as can be seen in the full logs with my included print
messages, 249c000 is being used just before this, with a ref count of
2.  249c000 does appear to reach a point where it should have been freed
in the past, before it gets used again as seen in both logs.

The failing boot log:

U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
Trying to boot from SPI
Jumping to 64-bit U-Boot: Note many features are missing


U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)

CPU:   QEMU Virtual CPU version 2.5+
DRAM:  128 MiB
Core:  13 devices, 13 uclasses, devicetree: separate
Loading Environment from nowhere... OK
Model: QEMU x86 (I440FX)
Net:   e1000: 00:00:00:00:00:00

Error: e1000#0 No valid MAC address found.
   eth_initialize() No ethernet found.


Hit any key to stop autoboot:  0
Scanning for bootflows in all bootdevs
Seq  Method   State   UclassPart  Name  Filename
---  ---  --        

scanning bus for devices...
Target spinup took 0 ms.
SATA link 1 timeout.
Target spinup took 0 ms.
SATA link 3 timeout.
SATA link 4 timeout.
SATA link 5 timeout.