Re: PROBLEM: IO lockup on reiserfs FS.
On Wed, 5 Aug 2020 12:51:41 -0700 Linus Torvalds wrote: > On Wed, Aug 5, 2020 at 9:53 AM wrote: > > > > It's been over 1 week since I sent this into the reiserfs-devel > > mailing list. I'm escalating this as the kernel docs recommend. > > I'm still willing to help debug and test a fix for this problem. > > The thing is, you're using an ancient 4.14 kernel, and a filesystem > that isn't really maintained any more. You'll find very few people > interested in trying to debug that combination. > > You *might* have more luck with a more modern kernel, but even then > ... reiserfs? > > Linus > This bug appears to have been fixed some where between 4.14.X and the 5.17.X series. I don't know why the fix wasn't backported, but it doesn't really matter to me as I can run the newer kernel. Thanks everyone for your help. David
Re: PROBLEM: IO lockup on reiserfs FS.
On Wed, Aug 5, 2020 at 5:01 PM wrote: > On Wed, 5 Aug 2020 12:51:41 -0700 > Linus Torvalds wrote: > > On Wed, Aug 5, 2020 at 9:53 AM wrote: > > > > > > It's been over 1 week since I sent this into the reiserfs-devel > > > mailing list. I'm escalating this as the kernel docs recommend. > > > I'm still willing to help debug and test a fix for this problem. > > > > The thing is, you're using an ancient 4.14 kernel, > > Sorry, I didn't realize kernel development went that fast. > I did try to go to the 5.X series, but the AMDGPU drivers don't work on > my SI card anymore (I need to bisect which takes time and many re-boots > to find the problematic commit). > I'll try the Radeon-SI driver and see if I can reproduce this reliably. > > > and a filesystem > > that isn't really maintained any more. You'll find very few people > > interested in trying to debug that combination. > > > > You *might* have more luck with a more modern kernel, but even then > > ... reiserfs? > > > >        Linus > > > > Why does no one (I've met others who share a similar sentiment), like > reiserfs? Could be because 'others' are all 'virtuous' individuals, employed by 'virtuous' corporations, headquartered at 'virtuous' western countries, whose 'virtuous' governments are able to finance the finest hasbara...er, propaganda, that a corporatocracy...er, 'democracy', can buy. > I'm not looking for fight, I'm incredulous. It's a great FS > that survives oops-es, power failures, and random crashes very very well. > It's the only FLOSS FS with tail packing. On a more sober note, Reiser4, Software Framework Release Number (SFRN) 4.0.2, is stable, and supercedes the features you appreciate in reiserfs, like Edward stated in his subsequent reply. > > Thanks, > David Best Professional Regards. -- Jose R R http://metztli.it - Download Metztli Reiser4: Debian Buster w/ Linux 5.5.19 AMD64 - feats ZSTD compression https://sf.net/projects/metztli-reiser4/ --- Official current Reiser4 resources: https://reiser4.wiki.kernel.org/
Re: PROBLEM: IO lockup on reiserfs FS.
On 08/06/2020 02:01 AM, hgntk...@vfemail.net wrote: On Wed, 5 Aug 2020 12:51:41 -0700 Linus Torvalds wrote: On Wed, Aug 5, 2020 at 9:53 AM wrote: It's been over 1 week since I sent this into the reiserfs-devel mailing list. I'm escalating this as the kernel docs recommend. I'm still willing to help debug and test a fix for this problem. The thing is, you're using an ancient 4.14 kernel, Sorry, I didn't realize kernel development went that fast. I did try to go to the 5.X series, but the AMDGPU drivers don't work on my SI card anymore (I need to bisect which takes time and many re-boots to find the problematic commit). I'll try the Radeon-SI driver and see if I can reproduce this reliably. and a filesystem that isn't really maintained any more. You'll find very few people interested in trying to debug that combination. You *might* have more luck with a more modern kernel, but even then ... reiserfs? Linus Why does no one (I've met others who share a similar sentiment), like reiserfs? I'm not looking for fight, I'm incredulous. It's a great FS that survives oops-es, power failures, and random crashes very very well. It's the only FLOSS FS with tail packing. Thanks, David Hi David, The feature of "tail packing", that you need, is brought to perfection in Reiser4 file system. Other file systems either don't provide tight packing of records in the storage tree, or they are read-only. You just need to manually patch (*) the kernel and install a pair of user-space packages (**). The latest stuff (against Linux-5.7) is stable. For older kernels you will need a backport for some fixups (***). We can prepare it for you. Reiser4 is fully supported. If any problems (including partition check/repair) - send a message to reiserfs-devel mailing list. (*) https://reiser4.wiki.kernel.org/index.php/Reiser4_Howto https://sourceforge.net/projects/reiser4/files/ (**) https://sourceforge.net/projects/reiser4/files/reiser4-utils/ (***) https://marc.info/?l=reiserfs-devel=158086248927420=2 Thanks, Edward.
Re: PROBLEM: IO lockup on reiserfs FS.
On Wed, 5 Aug 2020 12:51:41 -0700 Linus Torvalds wrote: > On Wed, Aug 5, 2020 at 9:53 AM wrote: > > > > It's been over 1 week since I sent this into the reiserfs-devel > > mailing list. I'm escalating this as the kernel docs recommend. > > I'm still willing to help debug and test a fix for this problem. > > The thing is, you're using an ancient 4.14 kernel, Sorry, I didn't realize kernel development went that fast. I did try to go to the 5.X series, but the AMDGPU drivers don't work on my SI card anymore (I need to bisect which takes time and many re-boots to find the problematic commit). I'll try the Radeon-SI driver and see if I can reproduce this reliably. > and a filesystem > that isn't really maintained any more. You'll find very few people > interested in trying to debug that combination. > > You *might* have more luck with a more modern kernel, but even then > ... reiserfs? > > Linus > Why does no one (I've met others who share a similar sentiment), like reiserfs? I'm not looking for fight, I'm incredulous. It's a great FS that survives oops-es, power failures, and random crashes very very well. It's the only FLOSS FS with tail packing. Thanks, David
Re: PROBLEM: IO lockup on reiserfs FS.
On Wed, Aug 5, 2020 at 9:53 AM wrote: > > It's been over 1 week since I sent this into the reiserfs-devel mailing > list. I'm escalating this as the kernel docs recommend. > I'm still willing to help debug and test a fix for this problem. The thing is, you're using an ancient 4.14 kernel, and a filesystem that isn't really maintained any more. You'll find very few people interested in trying to debug that combination. You *might* have more luck with a more modern kernel, but even then ... reiserfs? Linus
Re: PROBLEM: IO lockup on reiserfs FS.
It's been over 1 week since I sent this into the reiserfs-devel mailing list. I'm escalating this as the kernel docs recommend. I'm still willing to help debug and test a fix for this problem. "Given enough eyeballs, all bugs are shallow". This bug is visible, could we please quash it? Original message: https://lkml.org/lkml/2020/7/28/1435 Filed bug: https://bugzilla.kernel.org/show_bug.cgi?id=208719 Thanks, David
PROBLEM: IO lockup on reiserfs FS.
Hello, When running rtorrent (in sync to file mode), my kernel eventually will not write to the disk causing all access to md7 to hang and eventually the kernel will become totally unresponsive. EDIT: I have since gotten a response in the bug tracker and I am filing thins to the reiserfs maintainers. (I'd find the file and report to the correct maintainer, but the docs are incomplete when it comes to discovering the source of the hung tasks in my case. I have filed bugs to correct this.) I still have no clue how to get gdb to print out where exactly this is happening. It can't find the __schedule function in the kernel or in the reiserfs.ko object file. I'm not even sure __schedule is the problem as it seems to be lock related which would point to __mutex_lock.isra. EDIT: I've managed to verify that all disks are accessible (smartctl -a /dev/sdX) and that the RAID array is working and accessible (mdadm --detail and echo-ing check/idle to the array and seeing progress). Keywords: Hung tasks, Lockup, reiserfs, RAID. Linux version 4.14.184-nopreempt-AMDGPU-dav9 (root@Phenom-II-x6) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Fri Jun 19 17:02:34 UTC 2020 I didn't have Internet before 4.9.X so I can't say if any version of the Linux kernel didn't have this problem. [68812.480459] Not tainted 4.14.184-nopreempt-AMDGPU-dav9 #1 [68812.480464] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [68812.480469] CacheThread_Blo D0 9414 9082 0x0080 [68812.480476] Call Trace: [68812.480494] __schedule+0x29e/0x6c0 [68812.480505] schedule+0x32/0x80 [68812.480513] schedule_preempt_disabled+0xa/0x10 [68812.480520] __mutex_lock.isra.1+0x26b/0x4e0 [68812.480550] ? do_journal_begin_r+0xbe/0x390 [reiserfs] [68812.480570] do_journal_begin_r+0xbe/0x390 [reiserfs] [68812.480586] ? __switch_to_asm+0x35/0x70 [68812.480588] ? __switch_to_asm+0x41/0x70 [68812.480590] ? __switch_to_asm+0x35/0x70 [68812.480592] ? __switch_to_asm+0x41/0x70 [68812.480593] ? __switch_to_asm+0x35/0x70 [68812.480595] ? __switch_to_asm+0x41/0x70 [68812.480597] ? __switch_to_asm+0x35/0x70 [68812.480601] journal_begin+0x80/0x140 [reiserfs] [68812.480606] reiserfs_dirty_inode+0x3d/0xa0 [reiserfs] [68812.480609] ? __switch_to+0x1ee/0x3f0 [68812.480610] ? __switch_to+0x1ee/0x3f0 [68812.480612] __mark_inode_dirty+0x163/0x350 [68812.480615] generic_update_time+0x79/0xc0 [68812.480617] ? current_time+0x38/0x70 [68812.480619] file_update_time+0xbe/0x110 [68812.480622] __generic_file_write_iter+0x99/0x1b0 [68812.480624] generic_file_write_iter+0xe2/0x1c0 [68812.480626] __vfs_write+0x102/0x180 [68812.480628] vfs_write+0xb0/0x190 [68812.480630] SyS_pwrite64+0x90/0xb0 [68812.480632] do_syscall_64+0x6e/0x110 [68812.480634] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [68812.480637] RIP: 0033:0x7f8a92976983 [68812.480638] RSP: 002b:7f8a779e1630 EFLAGS: 0293 ORIG_RAX: 0012 [68812.480640] RAX: ffda RBX: 7f8a779e1640 RCX: 7f8a92976983 [68812.480641] RDX: 0128 RSI: 01790ddf3c00 RDI: 0040 [68812.480642] RBP: 7f8a779e16f0 R08: 7f8a779e1558 R09: 0001 [68812.480643] R10: 2000 R11: 0293 R12: 2000 [68812.480644] R13: 01790a6b8dd0 R14: 01790ddf3c00 R15: 0128 [68812.480647] INFO: task ThreadPoolSingl:9908 blocked for more than 480 seconds. [68812.480648] Not tainted 4.14.184-nopreempt-AMDGPU-dav9 #1 [68812.480649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [68812.480650] ThreadPoolSingl D0 9908 9082 0x0080 [68812.480651] Call Trace: [68812.480653] __schedule+0x29e/0x6c0 [68812.480655] schedule+0x32/0x80 [68812.480657] schedule_preempt_disabled+0xa/0x10 [68812.480658] __mutex_lock.isra.1+0x26b/0x4e0 [68812.480664] ? do_journal_begin_r+0xbe/0x390 [reiserfs] [68812.480668] do_journal_begin_r+0xbe/0x390 [reiserfs] [68812.480672] ? reiserfs_lookup+0xb5/0x160 [reiserfs] [68812.480677] journal_begin+0x80/0x140 [reiserfs] [68812.480681] reiserfs_create+0xfc/0x210 [reiserfs] [68812.480685] path_openat+0x1419/0x14d0 [68812.480687] ? futex_wake+0x91/0x170 [68812.480689] do_filp_open+0x99/0x110 [68812.480693] ? __check_object_size+0xfa/0x1a0 [68812.480695] ? __alloc_fd+0x3d/0x160 [68812.480696] ? do_sys_open+0x12e/0x210 [68812.480698] do_sys_open+0x12e/0x210 [68812.480700] do_syscall_64+0x6e/0x110 [68812.480702] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [68812.480703] RIP: 0033:0x7f8a8c48770d [68812.480704] RSP: 002b:7f8a73eeb220 EFLAGS: 0293 ORIG_RAX: 0002 [68812.480706] RAX: ffda RBX: RCX: 7f8a8c48770d [68812.480707] RDX: 0180 RSI: 00c2 RDI: 0179286b1ae0 [68812.480708] RBP: 0003a2f8 R08: c0c1 R09: 01792b1b04e0 [68812.480709] R10: R11: 0293 R12: 0179286b1b1a [68812.480710]
Re: PROBLEM: IO lockup on reiserfs FS.
I should add that in chasing down this bug I have tried all the IO schedulers available (noop deadline and cfq). Cfq is the one I'm now using to reproduce this. Also, I don't know if it makes a difference, but when the system first starts up it takes 20m to get from the login manager to having my web browsers restart and get all their pages from online. It might be because there is a lot of IO going on or it might be that there are several stalls in the scheduling; just not bad enough to cause a hung task problem like above.