Re: PROBLEM: IO lockup on reiserfs FS.

2020-08-17 Thread David Niklas
On Wed, 5 Aug 2020 12:51:41 -0700
Linus Torvalds  wrote:
> On Wed, Aug 5, 2020 at 9:53 AM  wrote:
> >
> > It's been over 1 week since I sent this into the reiserfs-devel
> > mailing list. I'm escalating this as the kernel docs recommend.
> > I'm still willing to help debug and test a fix for this problem.  
> 
> The thing is, you're using an ancient 4.14 kernel, and a filesystem
> that isn't really maintained any more. You'll find very few people
> interested in trying to debug that combination.
> 
> You *might* have more luck with a more modern kernel, but even then
> ... reiserfs?
> 
>   Linus
> 

This bug appears to have been fixed some where between 4.14.X and the
5.17.X series. I don't know why the fix wasn't backported, but it doesn't
really matter to me as I can run the newer kernel.

Thanks everyone for your help.
David


Re: PROBLEM: IO lockup on reiserfs FS.

2020-08-05 Thread Metztli Information Technology
 
On Wed, Aug 5, 2020 at 5:01 PM  wrote:

> On Wed, 5 Aug 2020 12:51:41 -0700
> Linus Torvalds  wrote:
> > On Wed, Aug 5, 2020 at 9:53 AM  wrote:
> > >
> > > It's been over 1 week since I sent this into the reiserfs-devel
> > > mailing list. I'm escalating this as the kernel docs recommend.
> > > I'm still willing to help debug and test a fix for this problem.  
> > 
> > The thing is, you're using an ancient 4.14 kernel, 
> 
> Sorry, I didn't realize kernel development went that fast.
> I did try to go to the 5.X series, but the AMDGPU drivers don't work on
> my SI card anymore (I need to bisect which takes time and many re-boots
> to find the problematic commit).
> I'll try the Radeon-SI driver and see if I can reproduce this reliably.
> 
> > and a filesystem
> > that isn't really maintained any more. You'll find very few people
> > interested in trying to debug that combination.
> > 
> > You *might* have more luck with a more modern kernel, but even then
> > ... reiserfs?
> > 
> >               Linus
> > 
> 
> Why does no one (I've met others who share a similar sentiment), like
> reiserfs?
Could be because 'others' are all 'virtuous' individuals, employed by 
'virtuous' corporations, headquartered at 'virtuous' western countries, whose 
'virtuous' governments are able to finance the finest hasbara...er, propaganda, 
that a corporatocracy...er, 'democracy', can buy.

> I'm not looking for fight, I'm incredulous. It's a great FS
> that survives oops-es, power failures, and random crashes very very well.
> It's the only FLOSS FS with tail packing.
On a more sober note, Reiser4, Software Framework Release Number (SFRN) 4.0.2, 
is stable, and supercedes the features you appreciate in reiserfs, like Edward 
stated in his subsequent reply.
> 
> Thanks,
> David


Best Professional Regards.

-- 
Jose R R
http://metztli.it
-
Download Metztli Reiser4: Debian Buster w/ Linux 5.5.19 AMD64
-
feats ZSTD compression https://sf.net/projects/metztli-reiser4/
---
Official current Reiser4 resources: https://reiser4.wiki.kernel.org/


Re: PROBLEM: IO lockup on reiserfs FS.

2020-08-05 Thread Edward Shishkin

On 08/06/2020 02:01 AM, hgntk...@vfemail.net wrote:

On Wed, 5 Aug 2020 12:51:41 -0700
Linus Torvalds  wrote:

On Wed, Aug 5, 2020 at 9:53 AM  wrote:


It's been over 1 week since I sent this into the reiserfs-devel
mailing list. I'm escalating this as the kernel docs recommend.
I'm still willing to help debug and test a fix for this problem.


The thing is, you're using an ancient 4.14 kernel,


Sorry, I didn't realize kernel development went that fast.
I did try to go to the 5.X series, but the AMDGPU drivers don't work on
my SI card anymore (I need to bisect which takes time and many re-boots
to find the problematic commit).
I'll try the Radeon-SI driver and see if I can reproduce this reliably.


and a filesystem
that isn't really maintained any more. You'll find very few people
interested in trying to debug that combination.

You *might* have more luck with a more modern kernel, but even then
... reiserfs?

   Linus



Why does no one (I've met others who share a similar sentiment), like
reiserfs? I'm not looking for fight, I'm incredulous. It's a great FS
that survives oops-es, power failures, and random crashes very very well.
It's the only FLOSS FS with tail packing.

Thanks,
David



Hi David,

The feature of "tail packing", that you need, is brought to perfection
in Reiser4 file system. Other file systems either don't provide tight
packing of records in the storage tree, or they are read-only. You just
need to manually patch (*) the kernel and install a pair of user-space
packages (**).

The latest stuff (against Linux-5.7) is stable. For older kernels you
will need a backport for some fixups (***). We can prepare it for you.

Reiser4 is fully supported. If any problems (including partition
check/repair) - send a message to reiserfs-devel mailing list.

(*)   https://reiser4.wiki.kernel.org/index.php/Reiser4_Howto
  https://sourceforge.net/projects/reiser4/files/
(**)  https://sourceforge.net/projects/reiser4/files/reiser4-utils/
(***) https://marc.info/?l=reiserfs-devel=158086248927420=2

Thanks,
Edward.


Re: PROBLEM: IO lockup on reiserfs FS.

2020-08-05 Thread Hgntkwis
On Wed, 5 Aug 2020 12:51:41 -0700
Linus Torvalds  wrote:
> On Wed, Aug 5, 2020 at 9:53 AM  wrote:
> >
> > It's been over 1 week since I sent this into the reiserfs-devel
> > mailing list. I'm escalating this as the kernel docs recommend.
> > I'm still willing to help debug and test a fix for this problem.  
> 
> The thing is, you're using an ancient 4.14 kernel, 

Sorry, I didn't realize kernel development went that fast.
I did try to go to the 5.X series, but the AMDGPU drivers don't work on
my SI card anymore (I need to bisect which takes time and many re-boots
to find the problematic commit).
I'll try the Radeon-SI driver and see if I can reproduce this reliably.

> and a filesystem
> that isn't really maintained any more. You'll find very few people
> interested in trying to debug that combination.
> 
> You *might* have more luck with a more modern kernel, but even then
> ... reiserfs?
> 
>   Linus
> 

Why does no one (I've met others who share a similar sentiment), like
reiserfs? I'm not looking for fight, I'm incredulous. It's a great FS
that survives oops-es, power failures, and random crashes very very well.
It's the only FLOSS FS with tail packing.

Thanks,
David


Re: PROBLEM: IO lockup on reiserfs FS.

2020-08-05 Thread Linus Torvalds
On Wed, Aug 5, 2020 at 9:53 AM  wrote:
>
> It's been over 1 week since I sent this into the reiserfs-devel mailing
> list. I'm escalating this as the kernel docs recommend.
> I'm still willing to help debug and test a fix for this problem.

The thing is, you're using an ancient 4.14 kernel, and a filesystem
that isn't really maintained any more. You'll find very few people
interested in trying to debug that combination.

You *might* have more luck with a more modern kernel, but even then
... reiserfs?

  Linus


Re: PROBLEM: IO lockup on reiserfs FS.

2020-08-05 Thread Hgntkwis
It's been over 1 week since I sent this into the reiserfs-devel mailing
list. I'm escalating this as the kernel docs recommend.
I'm still willing to help debug and test a fix for this problem.

"Given enough eyeballs, all bugs are shallow".
This bug is visible, could we please quash it?

Original message:
https://lkml.org/lkml/2020/7/28/1435
Filed bug:
https://bugzilla.kernel.org/show_bug.cgi?id=208719

Thanks,
David


PROBLEM: IO lockup on reiserfs FS.

2020-07-28 Thread David Niklas
Hello,

When running rtorrent (in sync to file mode), my kernel eventually will
not write to the disk causing all access to md7 to hang and eventually the
kernel will become totally unresponsive.

EDIT: I have since gotten a response in the bug tracker and I am filing
thins to the reiserfs maintainers.

 (I'd find the file and report to the
correct maintainer, but the docs are incomplete when it comes to
discovering the source of the hung tasks in my case. I have filed bugs to
correct this.) 
I still have no clue how to get gdb to print out where exactly this is
happening. It can't find the __schedule function in the kernel or in the
reiserfs.ko object file. I'm not even sure __schedule is the problem as
it seems to be lock related which would point to __mutex_lock.isra.

EDIT: I've managed to verify that all disks are accessible (smartctl
-a /dev/sdX) and that the RAID array is working and accessible (mdadm
--detail and echo-ing check/idle to the array and seeing progress).

Keywords: Hung tasks, Lockup, reiserfs, RAID.

Linux version 4.14.184-nopreempt-AMDGPU-dav9 (root@Phenom-II-x6) (gcc
version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Fri Jun 19
17:02:34 UTC 2020

I didn't have Internet before 4.9.X so I can't say if any version of the
Linux kernel didn't have this problem.

[68812.480459]   Not tainted 4.14.184-nopreempt-AMDGPU-dav9 #1
[68812.480464] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message. [68812.480469] CacheThread_Blo D0  9414   9082
0x0080 [68812.480476] Call Trace:
[68812.480494]  __schedule+0x29e/0x6c0
[68812.480505]  schedule+0x32/0x80
[68812.480513]  schedule_preempt_disabled+0xa/0x10
[68812.480520]  __mutex_lock.isra.1+0x26b/0x4e0
[68812.480550]  ? do_journal_begin_r+0xbe/0x390 [reiserfs]
[68812.480570]  do_journal_begin_r+0xbe/0x390 [reiserfs]
[68812.480586]  ? __switch_to_asm+0x35/0x70
[68812.480588]  ? __switch_to_asm+0x41/0x70
[68812.480590]  ? __switch_to_asm+0x35/0x70
[68812.480592]  ? __switch_to_asm+0x41/0x70
[68812.480593]  ? __switch_to_asm+0x35/0x70
[68812.480595]  ? __switch_to_asm+0x41/0x70
[68812.480597]  ? __switch_to_asm+0x35/0x70
[68812.480601]  journal_begin+0x80/0x140 [reiserfs]
[68812.480606]  reiserfs_dirty_inode+0x3d/0xa0 [reiserfs]
[68812.480609]  ? __switch_to+0x1ee/0x3f0
[68812.480610]  ? __switch_to+0x1ee/0x3f0
[68812.480612]  __mark_inode_dirty+0x163/0x350
[68812.480615]  generic_update_time+0x79/0xc0
[68812.480617]  ? current_time+0x38/0x70
[68812.480619]  file_update_time+0xbe/0x110
[68812.480622]  __generic_file_write_iter+0x99/0x1b0
[68812.480624]  generic_file_write_iter+0xe2/0x1c0
[68812.480626]  __vfs_write+0x102/0x180
[68812.480628]  vfs_write+0xb0/0x190
[68812.480630]  SyS_pwrite64+0x90/0xb0
[68812.480632]  do_syscall_64+0x6e/0x110
[68812.480634]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[68812.480637] RIP: 0033:0x7f8a92976983
[68812.480638] RSP: 002b:7f8a779e1630 EFLAGS: 0293 ORIG_RAX:
0012 [68812.480640] RAX: ffda RBX:
7f8a779e1640 RCX: 7f8a92976983 [68812.480641] RDX:
0128 RSI: 01790ddf3c00 RDI: 0040
[68812.480642] RBP: 7f8a779e16f0 R08: 7f8a779e1558 R09:
0001 [68812.480643] R10: 2000 R11:
0293 R12: 2000 [68812.480644] R13:
01790a6b8dd0 R14: 01790ddf3c00 R15: 0128
[68812.480647] INFO: task ThreadPoolSingl:9908 blocked for more than 480
seconds. [68812.480648]   Not tainted 4.14.184-nopreempt-AMDGPU-dav9
#1 [68812.480649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message. [68812.480650] ThreadPoolSingl D0  9908   9082
0x0080 [68812.480651] Call Trace: [68812.480653]
__schedule+0x29e/0x6c0 [68812.480655]  schedule+0x32/0x80 [68812.480657]
schedule_preempt_disabled+0xa/0x10 [68812.480658]
__mutex_lock.isra.1+0x26b/0x4e0 [68812.480664]  ?
do_journal_begin_r+0xbe/0x390 [reiserfs] [68812.480668]
do_journal_begin_r+0xbe/0x390 [reiserfs] [68812.480672]  ?
reiserfs_lookup+0xb5/0x160 [reiserfs] [68812.480677]
journal_begin+0x80/0x140 [reiserfs] [68812.480681]
reiserfs_create+0xfc/0x210 [reiserfs] [68812.480685]
path_openat+0x1419/0x14d0 [68812.480687]  ? futex_wake+0x91/0x170
[68812.480689]  do_filp_open+0x99/0x110
[68812.480693]  ? __check_object_size+0xfa/0x1a0
[68812.480695]  ? __alloc_fd+0x3d/0x160
[68812.480696]  ? do_sys_open+0x12e/0x210
[68812.480698]  do_sys_open+0x12e/0x210
[68812.480700]  do_syscall_64+0x6e/0x110
[68812.480702]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[68812.480703] RIP: 0033:0x7f8a8c48770d
[68812.480704] RSP: 002b:7f8a73eeb220 EFLAGS: 0293 ORIG_RAX:
0002 [68812.480706] RAX: ffda RBX:
 RCX: 7f8a8c48770d [68812.480707] RDX:
0180 RSI: 00c2 RDI: 0179286b1ae0
[68812.480708] RBP: 0003a2f8 R08: c0c1 R09:
01792b1b04e0 [68812.480709] R10:  R11:
0293 R12: 0179286b1b1a [68812.480710] 

Re: PROBLEM: IO lockup on reiserfs FS.

2020-07-28 Thread David Niklas
I should add that in chasing down this bug I have tried all the IO
schedulers available (noop deadline and cfq). Cfq is the one I'm now
using to reproduce this.

Also, I don't know if it makes a difference, but when the system first
starts up it takes 20m to get from the login manager to having my web
browsers restart and get all their pages from online. It might be because
there is a lot of IO going on or it might be that there are several
stalls in the scheduling; just not bad enough to cause a hung task
problem like above.