Re: PROBLEM: Reiser4 hard lockup
On 10/27/2020 08:36 PM, Theodore Y. Ts'o wrote: On Tue, Oct 27, 2020 at 01:53:31AM +0100, Edward Shishkin wrote: reiser4progs 1.1.x Software Framework Release Number (SFRN) 4.0.1 file system utilities should not be used to check/fix media formatted 'a priori' in SFRN 4.0.2 and vice-versa. Honestly, this is the first time I've heard about a Linux FS having versioning other than a major one This is because, unlike other Linux file systems, reiser4 is a framework. In vanilla kernel having a filesystem-as-framework is discouraged for ideological reasons. As they explained: "nobody's interested in plugins". A huge monolithic mess without any internal structure - welcome :) I wouldn't call it an ideological problem, but more about wanting to assure interoperability issues and wanting to reduce confusion on the part of users, especially if images get moved between systems. There is also plenty of way of introducing internal structure and code cleanliness without going completely undisciplined with respect to on-disk format extensions. :-) Have you made this up right now? I remember very well all the requests for merging reiser4 to upstream (in 2004, 2005 and 2006 years) - compatibility claims had never been raised. Especially, it is not a problem to add mechanisms for keeping track of compatibility at any time. Finally, I'll note that ext 2/3/4 does have a rather fine-grained set of feature flags, with specific rules about what the kernel --- and e2fsck --- should do if it finds a feature bit it doesn't understand in the incompat, ro_compat, and compat feature flags set. This is especially helpful since we have multiple implementations of ext 2/3/4 out there (in FreeBSD, the GRUB bootloader, GNU HURD, Fuchsia, etc.) and so using feature bits allow for safe and reliable interoperability with the user being warned if they can safely only mount the file system read-only, or not at all, if the file system has some new feature that their current OS version does not support. We can also give appropriate warnings if they are using an insufficiently recent version of the userspace tools. "Fine-grained" means per-volume decisions mount/not mount/read-only mount? It is even not yesterday technique. It is an ice age... Edward.
Re: PROBLEM: Reiser4 hard lockup
On Tue, Oct 27, 2020 at 01:53:31AM +0100, Edward Shishkin wrote: > > > reiser4progs 1.1.x Software Framework Release Number (SFRN) 4.0.1 file > > > system utilities should not be used to check/fix media formatted 'a > > > priori' in SFRN 4.0.2 and vice-versa. > > > > Honestly, this is the first time I've heard about a Linux FS having > > versioning other than a major one > > This is because, unlike other Linux file systems, reiser4 is a > framework. > > In vanilla kernel having a filesystem-as-framework is discouraged for > ideological reasons. As they explained: "nobody's interested in > plugins". A huge monolithic mess without any internal structure - > welcome :) I wouldn't call it an ideological problem, but more about wanting to assure interoperability issues and wanting to reduce confusion on the part of users, especially if images get moved between systems. There is also plenty of way of introducing internal structure and code cleanliness without going completely undisciplined with respect to on-disk format extensions. :-) Finally, I'll note that ext 2/3/4 does have a rather fine-grained set of feature flags, with specific rules about what the kernel --- and e2fsck --- should do if it finds a feature bit it doesn't understand in the incompat, ro_compat, and compat feature flags set. This is especially helpful since we have multiple implementations of ext 2/3/4 out there (in FreeBSD, the GRUB bootloader, GNU HURD, Fuchsia, etc.) and so using feature bits allow for safe and reliable interoperability with the user being warned if they can safely only mount the file system read-only, or not at all, if the file system has some new feature that their current OS version does not support. We can also give appropriate warnings if they are using an insufficiently recent version of the userspace tools. Cheers, - Ted
Re: PROBLEM: Reiser4 hard lockup
On 10/26/2020 02:07 AM, David Niklas wrote: I'll reply to both of you in this email. On Sun, 25 Oct 2020 02:04:22 -0700 (PDT) Metztli Information Technology wrote: Niltze, David- A few observations are in order below: On Sat, Oct 24, 2020 at 1:39 PM David Niklas wrote: Hello, reiser4progs 1.1.x Software Framework Release Number (SFRN) 4.0.1 file system utilities should not be used to check/fix media formatted 'a priori' in SFRN 4.0.2 and vice-versa. Honestly, this is the first time I've heard about a Linux FS having versioning other than a major one This is because, unlike other Linux file systems, reiser4 is a framework. In vanilla kernel having a filesystem-as-framework is discouraged for ideological reasons. As they explained: "nobody's interested in plugins". A huge monolithic mess without any internal structure - welcome :)
Re: PROBLEM: Reiser4 hard lockup
On Sun, Oct 25, 2020 at 6:10 PM David Niklas wrote: > > I'll reply to both of you in this email. > > On Sun, 25 Oct 2020 02:04:22 -0700 (PDT) > Metztli Information Technology wrote: > > Niltze, David- > > > > A few observations are in order below: > > > > On Sat, Oct 24, 2020 at 1:39 PM David Niklas > > wrote: > > > > > > Hello, > > > > > > reiser4progs 1.1.x Software Framework Release Number (SFRN) 4.0.1 file > > system utilities should not be used to check/fix media formatted 'a > > priori' in SFRN 4.0.2 and vice-versa. > > Honestly, this is the first time I've heard about a Linux FS having > versioning other than a major one (NTFS IIRC is infamous for it's > incompatibilities). > > > > > Conclusion: I suggest you read up on the published reiser4 > > documentation and build *both* your kernel and reiser4progs utilities > > conforming to the more recent stable SFRN 4.0.2. > > I religiously read the reiser4 documentation prior to changing my FS. No > where do I recall seeing, on the wiki, or in the man pages, that there > was a need to use a newer reiser4 progs. > Ok, here it is: > https://reiser4.wiki.kernel.org/index.php/Reiser4_development_model > I'm not normally going to read a page that looks like it's for developers > as an end user which is why I didn't initially read it. > > > Prior to build reiser4progs 1.2.1 SFRN 4.0.2: > > libaal-1.0.7.tar.gz > > < https://sourceforge.net/projects/reiser4/files/reiser4-utils/ > > > > > then build reiser4progs-1.2.1.tar.gz SFRN 4.0.2 > > < > > https://sourceforge.net/projects/reiser4/files/reiser4-utils/reiser4progs/ > > > > > > > as any reiser4 patches for kernel 4.14 and higher conform to SFRN 4.0.2: > > https://sourceforge.net/projects/reiser4/files/reiser4-for-linux-5.x/ > > > > citing, for the third time, reference email documenting the change, > > i.e. '2017-11-26 23:01:53 reiser4 SFRN 4.0.2 is not your father's > > reiser4 SFRN 4.0.1' < > > https://marc.info/?l=reiserfs-devel=151173731709826=2 > > > > > If you are going to use reiser4 kernel patches for linux 4.15.xy - > > 5.4.5 range, please make sure to *also* apply: [PATCH] reiser4: prevent > > system lockups: < > > https://marc.info/?l=reiserfs-devel=158086248927420=2 > > > > I was using 5.7.13 at the time. Should the crashes still be happening > because this patch is missing? Only "If you are going to use reiser4 kernel patches for linux 4.15.xy - 5.4.5 range," > It seems odd you'd mention it knowing > which kernel version I was using. I have to be proactive and anticipate a potential decision to use a lower version ;-) i.e., I am really baffled at how you were using reiser4 SFRN 4.0.2 patch for linux 5.7.13 -- yet built and were using old reiser4progs 1.1.x SFRN 4.0.1 :) It is not as if there were not reiser4 resources out there. One of the reasons I make available a reiser4 hack of Debian netboot ISOs on SourceForge is to provide some sort of reference implementation. If you do not like/trust the Debian metaframework OS hack installed onto your computer, fair, as you can only use the bootable iso to execute the commands suggested elsewhere by Ed to find out the proper SFRN and/or versions of the kernel and reiser4progs utilities which your *own* build might strive to match. > > > On Sun, 25 Oct 2020 13:50:15 +0100 > Edward Shishkin wrote: > > On 10/24/2020 10:36 PM, David Niklas wrote: > > > Hello, > > > Â > > > > Hi David, > > > > Thanks for the comprehensive report, which is definitely useful! > > Below you can find some hints and comments. > > > > (: > > > > > > It's a pity.. > > To be honest, I received complaints that reiser4 doesn't make > > a friendship with torrents long time ago. Unfortunately, I am in Europe, > > where it is impossible to use torrents that simply, without conflicts > > with local legislation. Respectively, I am not able to reproduce it, > > and the problem is still unfixed.. > > > I might try to reproduce this later and log the actual write patterns so > you can reproduce these crashes. Obviously, I'll have to learn how to > first. > > > > reiser4 mount option "dont_load_bitmap" is your friend. > > I knew about that, but I'm uncertain if it would change how reiser4 works > and then it will not cause the crash. > > > > I had also to manually chew through all the kernel .o files to find > > > where the kernel broke at (also attached). > > > > > > The command I used to create the reiser4 FS was: > > > mkfs.reiser4 -o > > > create=reg40,fibration=ext_3_fibre,hash=r5_hash,key=key_large,node=node40,compress=lzo1,compressMode=conv > > > /dev/md7p1 > > > I wanted to use reg40 as opposed to ccreg40 because I wanted an > > > unencrypted partition. > > > > > > You got confused. Reiser4 doesn't support encryption without special > > patches (which are not public). With "create=reg40" you get a "classic" > > setup without compression. > > > > There is a "getting started" page, which provides some recommendations > > on reiser4 mkfs and mount options: > >
Re: PROBLEM: Reiser4 hard lockup
I'll reply to both of you in this email. On Sun, 25 Oct 2020 02:04:22 -0700 (PDT) Metztli Information Technology wrote: > Niltze, David- > > A few observations are in order below: > > On Sat, Oct 24, 2020 at 1:39 PM David Niklas > wrote: > > > > Hello, > > > reiser4progs 1.1.x Software Framework Release Number (SFRN) 4.0.1 file > system utilities should not be used to check/fix media formatted 'a > priori' in SFRN 4.0.2 and vice-versa. Honestly, this is the first time I've heard about a Linux FS having versioning other than a major one (NTFS IIRC is infamous for it's incompatibilities). > Conclusion: I suggest you read up on the published reiser4 > documentation and build *both* your kernel and reiser4progs utilities > conforming to the more recent stable SFRN 4.0.2. I religiously read the reiser4 documentation prior to changing my FS. No where do I recall seeing, on the wiki, or in the man pages, that there was a need to use a newer reiser4 progs. Ok, here it is: https://reiser4.wiki.kernel.org/index.php/Reiser4_development_model I'm not normally going to read a page that looks like it's for developers as an end user which is why I didn't initially read it. > Prior to build reiser4progs 1.2.1 SFRN 4.0.2: > libaal-1.0.7.tar.gz > < https://sourceforge.net/projects/reiser4/files/reiser4-utils/ > > > then build reiser4progs-1.2.1.tar.gz SFRN 4.0.2 > < > https://sourceforge.net/projects/reiser4/files/reiser4-utils/reiser4progs/ > > > > as any reiser4 patches for kernel 4.14 and higher conform to SFRN 4.0.2: > https://sourceforge.net/projects/reiser4/files/reiser4-for-linux-5.x/ > > citing, for the third time, reference email documenting the change, > i.e. '2017-11-26 23:01:53 reiser4 SFRN 4.0.2 is not your father's > reiser4 SFRN 4.0.1' < > https://marc.info/?l=reiserfs-devel=151173731709826=2 > > > If you are going to use reiser4 kernel patches for linux 4.15.xy - > 5.4.5 range, please make sure to *also* apply: [PATCH] reiser4: prevent > system lockups: < > https://marc.info/?l=reiserfs-devel=158086248927420=2 > I was using 5.7.13 at the time. Should the crashes still be happening because this patch is missing? It seems odd you'd mention it knowing which kernel version I was using. On Sun, 25 Oct 2020 13:50:15 +0100 Edward Shishkin wrote: > On 10/24/2020 10:36 PM, David Niklas wrote: > > Hello, > > > > Hi David, > > Thanks for the comprehensive report, which is definitely useful! > Below you can find some hints and comments. > (: > > It's a pity.. > To be honest, I received complaints that reiser4 doesn't make > a friendship with torrents long time ago. Unfortunately, I am in Europe, > where it is impossible to use torrents that simply, without conflicts > with local legislation. Respectively, I am not able to reproduce it, > and the problem is still unfixed.. > I might try to reproduce this later and log the actual write patterns so you can reproduce these crashes. Obviously, I'll have to learn how to first. > reiser4 mount option "dont_load_bitmap" is your friend. I knew about that, but I'm uncertain if it would change how reiser4 works and then it will not cause the crash. > > I had also to manually chew through all the kernel .o files to find > > where the kernel broke at (also attached). > > > > The command I used to create the reiser4 FS was: > > mkfs.reiser4 -o > > create=reg40,fibration=ext_3_fibre,hash=r5_hash,key=key_large,node=node40,compress=lzo1,compressMode=conv > > /dev/md7p1 > > I wanted to use reg40 as opposed to ccreg40 because I wanted an > > unencrypted partition. > > > You got confused. Reiser4 doesn't support encryption without special > patches (which are not public). With "create=reg40" you get a "classic" > setup without compression. > > There is a "getting started" page, which provides some recommendations > on reiser4 mkfs and mount options: > https://reiser4.wiki.kernel.org/index.php/Reiser4_Howto Ah, I read that page 4 times but thought it out of date because the plugin description said something else. > > Likewise, I changed the fibration to ext_3_fibre > > from ext_1_fibre. Other then that, everything is set to it's defaults. > > Interestingly, if I try to set the key to short and change the mode to > > tea (a time compute trade off AFAIK), I crashes mkfs.reiser4. > > I need to report this to the developers. > > Short keys is an exotic option (are you restricted in disk space?). Quite the contrary, I wanted an excuse to try using tea. IDK what the key lengths are (I've been really curious about that), but I decided to try using tea and short hashes as a fun exercise as a time/compute trade off. > But that crash needs to be fixed, of course. I'll create a ticket. Nice! > > When trying to remount the FS after this crash I got an error from > > fsck that I needed to rebuild the super block. Considering that all > > transactions are atomic, this was quite a surprise to me. > > This failed because the format version was somehow
Re: PROBLEM: Reiser4 hard lockup
Niltze, David- A few observations are in order below: On Sat, Oct 24, 2020 at 1:39 PM David Niklas wrote: > > Hello, > > # Intro > Pardon my tardiness in reporting this, I was stalling my disk upgrade to > help test a fix for a reiserfs problem. I needed to get my life going > again before taking the time to report this. > This is a heads up for a serious problem. I no longer use reiser4 > anymore because I can't have my kernel hard and soft locking up within > hours of booting and I don't use the 5.7.13. Therefore, I can't test a > fix for this, but I am willing to test future releases of reiser4 on a > test partition. > The problem might lie elsewhere in the Linux kernel considering how many > panics it threw before hard locking up, but I am starting with the > reiser4 maintainer and ML because kernel 5.8.X without loading the > reiser4 module has been quite stable. > > # 2. Description > The Linux kernel hard and/or soft locks up only hours after booting when > using reiser4. It throws several panics before hand. The applications that > trigger this bug are rtorrent + dar + sync. > > # 3. Keywords > hard lockup, soft lockup, reiser4, rcu > > # 4. Kernel information. > 5.7.13 x86_64 > > # 5. Kernel without bug. > NA > > # 6. Oops message. > Way too big. See attached. > Here's something to wet your tongue: > > [ 4483.173140] NMI backtrace for cpu 0 > [ 4483.173143] CPU: 0 PID: 21593 Comm: dar Not tainted > 5.7.13-nopreempt-Radeon-SI-dav10 #4 [ 4483.173144] Hardware name: > Gigabyte Technology Co., Ltd. To be filled by O.E.M./970A-DS3P, BIOS FD > 02/26/2016 [ 4483.173145] Call Trace: [ 4483.173148]  > [ 4483.173153]  dump_stack+0x66/0x8b > [ 4483.173155]  nmi_cpu_backtrace+0x89/0x90 > [ 4483.173157]  ? lapic_can_unplug_cpu+0x90/0x90 > ... > [ 4483.173213]  jput_final+0x303/0x320 [reiser4] > [ 4483.173220]  reiser4_invalidate_list+0x3e/0x50 [reiser4] > [ 4483.173228]  reiser4_write_logs+0x76/0x560 [reiser4] > ... > [ 4557.097894] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2 > ... > [ 4557.600871]  __schedule+0x288/0x5d0 > [ 4557.600874]  schedule+0x4a/0xb0 > [ 4557.600875]  schedule_timeout+0x14a/0x300 > ... > > # 7. Shell script to trigger the problem. > I tried to create an artificial workload using dd, cp, sync, and other > programs to cause the fault without success. > > # 8. Enviroment. > % dar --version > >  dar version 2.5.17, Copyright (C) 2002-2052 Denis Corbin >   Long options support    : YES > >  Using libdar 5.13.0 built with compilation time options: >   Libz compression (gzip)    : YES >   Libbz2 compression (bzip2)  : YES >   Liblzo2 compression (lzo)   : YES >   Liblzma compression (xz)   : YES >   Strong encryption (libgcrypt): YES >   Public key ciphers (gpgme)  : NO >   Extended Attributes support  : YES >   Large files support (> 2GB)  : YES >   ext2fs NODUMP flag support  : YES >   Special allocation scheme   : NO >   Integer size used       : unlimited >   Thread safe support      : YES >   Furtive read mode support   : YES >   Linux ext2/3/4 FSA support  : YES >   Mac OS X HFS+ FSA support   : NO >   Detected system/CPU endian  : little >   Posix fadvise support     : YES >   Large dir. speed optimi.   : YES >   Timestamp read accuracy    : 1 microsecond >   Timestamp write accuracy   : 1 microsecond >   Restores dates of symlinks  : YES > >  compiled the Nov 23 2018 with GNUC version 6.3.0 20170516 >  dar is part of the Disk ARchive suite (Release 2.5.17) > > %  rtorrent -h > Rakshasa's BitTorrent client version 0.9.6. > > %  sync --version > sync (GNU coreutils) 8.26 > > % mkfs.reiser4 --version > mkfs.reiser4 1.1.0 > Format release: 4.0.1 > > % fsck.reiser4 --version > fsck.reiser4 1.1.0 > Format release: 4.0.1 reiser4progs 1.1.x Software Framework Release Number (SFRN) 4.0.1 file system utilities should not be used to check/fix media formatted 'a priori' in SFRN 4.0.2 and vice-versa. > > % head -n28 /proc/cpuinfo # The info in just repeated for all the%  cores. > processor    : 0 > vendor_id    : AuthenticAMD > cpu family    : 16 > model      : 10 > model name    : AMD Phenom(tm) II X6 1090T Processor > stepping     : 0 > microcode    : 0x1bf > cpu MHz     : 2011.953 > cache size    : 512 KB > physical id   : 0 > siblings     : 5 > core id     : 0 > cpu cores    : 5 > apicid      : 0 > initial apicid  : 0 > fpu       : yes > fpu_exception  : yes > cpuid level   : 6 > wp        : yes > flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext > fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl > nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm >
PROBLEM: Reiser4 hard lockup
Hello, # Intro Pardon my tardiness in reporting this, I was stalling my disk upgrade to help test a fix for a reiserfs problem. I needed to get my life going again before taking the time to report this. This is a heads up for a serious problem. I no longer use reiser4 anymore because I can't have my kernel hard and soft locking up within hours of booting and I don't use the 5.7.13. Therefore, I can't test a fix for this, but I am willing to test future releases of reiser4 on a test partition. The problem might lie elsewhere in the Linux kernel considering how many panics it threw before hard locking up, but I am starting with the reiser4 maintainer and ML because kernel 5.8.X without loading the reiser4 module has been quite stable. # 2. Description The Linux kernel hard and/or soft locks up only hours after booting when using reiser4. It throws several panics before hand. The applications that trigger this bug are rtorrent + dar + sync. # 3. Keywords hard lockup, soft lockup, reiser4, rcu # 4. Kernel information. 5.7.13 x86_64 # 5. Kernel without bug. NA # 6. Oops message. Way too big. See attached. Here's something to wet your tongue: [ 4483.173140] NMI backtrace for cpu 0 [ 4483.173143] CPU: 0 PID: 21593 Comm: dar Not tainted 5.7.13-nopreempt-Radeon-SI-dav10 #4 [ 4483.173144] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./970A-DS3P, BIOS FD 02/26/2016 [ 4483.173145] Call Trace: [ 4483.173148] [ 4483.173153] dump_stack+0x66/0x8b [ 4483.173155] nmi_cpu_backtrace+0x89/0x90 [ 4483.173157] ? lapic_can_unplug_cpu+0x90/0x90 ... [ 4483.173213] jput_final+0x303/0x320 [reiser4] [ 4483.173220] reiser4_invalidate_list+0x3e/0x50 [reiser4] [ 4483.173228] reiser4_write_logs+0x76/0x560 [reiser4] ... [ 4557.097894] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2 ... [ 4557.600871] __schedule+0x288/0x5d0 [ 4557.600874] schedule+0x4a/0xb0 [ 4557.600875] schedule_timeout+0x14a/0x300 ... # 7. Shell script to trigger the problem. I tried to create an artificial workload using dd, cp, sync, and other programs to cause the fault without success. # 8. Enviroment. % dar --version dar version 2.5.17, Copyright (C) 2002-2052 Denis Corbin Long options support : YES Using libdar 5.13.0 built with compilation time options: Libz compression (gzip) : YES Libbz2 compression (bzip2) : YES Liblzo2 compression (lzo): YES Liblzma compression (xz) : YES Strong encryption (libgcrypt): YES Public key ciphers (gpgme) : NO Extended Attributes support : YES Large files support (> 2GB) : YES ext2fs NODUMP flag support : YES Special allocation scheme: NO Integer size used: unlimited Thread safe support : YES Furtive read mode support: YES Linux ext2/3/4 FSA support : YES Mac OS X HFS+ FSA support: NO Detected system/CPU endian : little Posix fadvise support: YES Large dir. speed optimi. : YES Timestamp read accuracy : 1 microsecond Timestamp write accuracy : 1 microsecond Restores dates of symlinks : YES compiled the Nov 23 2018 with GNUC version 6.3.0 20170516 dar is part of the Disk ARchive suite (Release 2.5.17) % rtorrent -h Rakshasa's BitTorrent client version 0.9.6. % sync --version sync (GNU coreutils) 8.26 % mkfs.reiser4 --version mkfs.reiser4 1.1.0 Format release: 4.0.1 % fsck.reiser4 --version fsck.reiser4 1.1.0 Format release: 4.0.1 % head -n28 /proc/cpuinfo # The info in just repeated for all the% cores. processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1090T Processor stepping: 0 microcode : 0x1bf cpu MHz : 2011.953 cache size : 512 KB physical id : 0 siblings: 5 core id : 0 cpu cores : 5 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr cpb hw_pstate vmmcall npt lbrv svm_lock nrip_save pausefilter bugs: tlb_mmatch apic_c1e fxsave_leak sysret_ss_attrs null_seg amd_e400 spectre_v1 spectre_v2 bogomips: 7368.27 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate cpb [8.3.] Module information (from /proc/modules): Not available. If you really need this I can boot the old kernel and insert mod. Here's the linked in modules, I it's probably equivalent: nls_iso8859_1 nls_cp437 fuse snd_emu10k1_synth