On Tue 02-04-19 12:35:07, Greg Kroah-Hartman wrote:
> On Tue, Apr 02, 2019 at 01:08:45PM +0300, Jari Ruusu wrote:
> > To trigger this ext4 file system bug, you need a sparse file with
> > correct sparse pattern on old-school ext3 file system. I tried
> > more simpler ways to trigger this but those attempts did not
> > trigger the bug. I have provided compressed sparse file that
> > reliably triggers the bug. Size of compressed sparse file 1667256
> > bytes. Size of uncompressed sparse file 7369850880 bytes.
> > Following commands will demo the problem.
> > 
> >   wget http://www.elisanet.fi/jariruusu/123/sparse-demo.data.xz
> >   xz -d sparse-demo.data.xz
> >   mkfs -t ext3 -b 4096 -e remount-ro -O "^dir_index" /dev/sdc1
> >   mount -t ext3 /dev/sdc1 /mnt
> >   cp -v --sparse=always sparse-demo.data /mnt/aa
> >   cp -v --sparse=always sparse-demo.data /mnt/bb
> >   umount /mnt
> >   mount -t ext3 /dev/sdc1 /mnt
> >   cp -v --sparse=always /mnt/bb /mnt/aa
> > 
> > That last cp command reliably triggers the bug that livelocks and
> > after reset you have file system corruption to deal with. Deeply
> > unfunny.
> > 
> > The bug is caused by
> > "ext4: brelse all indirect buffer in ext4_ind_remove_space()"
> > upstream commit 674a2b27234d1b7afcb0a9162e81b2e53aeef217, from
> > <[email protected]>, who provided a follow-up patch
> > "ext4: cleanup bh release code in ext4_ind_remove_space()"
> > upstream commit 5e86bdda41534e17621d5a071b294943cae4376e. The
> > problem with that follow-up patch is that it is almost criminally
> > mislabeled. It should have said "fixes ext3 livelock and file
> > system corrupting bug" or something like that, so that Greg KH &
> > Co would have understood that it must be backported to stable
> > kernels too. Now the bug appears to be in all/most stable kernels
> > already.
> > 
> > Below is the buggy patch that causes the problem. Look at those
> > new while loops. Once the while condition is true once, it is
> > ALWAYS true, so it livelocks.
> > 
> > > --- a/fs/ext4/indirect.c
> > > +++ b/fs/ext4/indirect.c
> > > @@ -1385,10 +1385,14 @@ end_range:
> > >                                      partial->p + 1,
> > >                                      partial2->p,
> > >                                      (chain+n-1) - partial);
> > > -                 BUFFER_TRACE(partial->bh, "call brelse");
> > > -                 brelse(partial->bh);
> > > -                 BUFFER_TRACE(partial2->bh, "call brelse");
> > > -                 brelse(partial2->bh);
> > > +                 while (partial > chain) {
> > > +                         BUFFER_TRACE(partial->bh, "call brelse");
> > > +                         brelse(partial->bh);
> > > +                 }
> > > +                 while (partial2 > chain2) {
> > > +                         BUFFER_TRACE(partial2->bh, "call brelse");
> > > +                         brelse(partial2->bh);
> > > +                 }
> > >                   return 0;
> > >           }
> > >
> > 
> > Greg & Co,
> > Please revert that above patch from stable kernels or backport the
> > follow-up patch that fixes the problem.
> 
> So you need 5e86bdda4153 ("ext4: cleanup bh release code in
> ext4_ind_remove_space()") applied to all of the stable and LTS kernels
> at the moment (as that patch only showed up in 5.1-rc1)?
> 
> If so, I need an ack from the ext4 developers/maintainer to do so.

Ack from me, and sorry for missing this brown paper bag bug during
review...

                                                                Honza
-- 
Jan Kara <[email protected]>
SUSE Labs, CR

Reply via email to