Re: rm: fts_read: No such file or directory

2021-01-14 Thread Paul de Weerd
Hi Otto,

Thanks for your reply.

On Thu, Jan 14, 2021 at 08:22:33AM +0100, Otto Moerbeek wrote:
| > Could there be some TOCTOU issue here somewhere?  Or some cache
| > misbehaviour?  Or is it really dying hardware?
| 
| My first bet would be some form of corruption. FLipped bits in e..g
| directories while operating normally cannot be seen by the
| clean/unclean flag in the superblock. That one only records if the
| filesystem was unmounted before reboot, shutdown or crash.

I understand that - but then why would the error clear on subsequent
runs of rm?

| The forced fsck might reveal more.

It did find some issues, and then was waiting for my input over night
(when the backup run mounted the filesystem and changed things).

** /dev/sd2a (ebb54a869d056df3.a)
** File system is already clean
** Last Mounted on /backup
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
ZERO LENGTH DIR I=57604332  OWNER=root MODE=40755
SIZE=0 MTIME=Jan 13 13:56 2021
CLEAR? [Fyn?] y

** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [Fyn?] y

SUMMARY INFORMATION BAD
SALVAGE? [Fyn?] y

BLK(S) MISSING IN BIT MAPS
SALVAGE? [Fyn?] y

27766624 files, 396630326 used, 267754002 free (2016066 frags,
33217242 blocks, 0.3% fragmentation)

* FILE SYSTEM WAS MODIFIED *

I ran it once more after that, more issues were found:

** /dev/sd2a (ebb54a869d056df3.a)
** File system is already clean
** Last Mounted on /backup
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? [Fyn?] y

SUMMARY INFORMATION BAD
SALVAGE? [Fyn?] y

BLK(S) MISSING IN BIT MAPS
SALVAGE? [Fyn?] y

27884252 files, 397169471 used, 267214857 free (1944825 frags,
33158754 blocks, 0.3% fragmentation)

* FILE SYSTEM WAS MODIFIED *

Until the third fsck came back clean:

** /dev/sd2a (ebb54a869d056df3.a)
** File system is already clean
** Last Mounted on /backup
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
27884252 files, 397169471 used, 267214857 free (1944825 frags,
33158754 blocks, 0.3% fragmentation)
  136m19.01s real 4m00.56s user20m33.85s system


I'll write it off to those errors, but I still don't understand why
re-trying would fix these kinds of issues.

Thanks again, Otto!

Paul

-- 
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/ 



Re: rm: fts_read: No such file or directory

2021-01-13 Thread Otto Moerbeek
On Wed, Jan 13, 2021 at 09:46:27PM +0100, Paul de Weerd wrote:

> Hi all,
> 
> While doing some clean-up on my backup filesystem (which extensively
> uses hardlinks), I came across the error in Subject:
> 
>       rm: fts_read: No such file or directory
> 
> Traversing the hierarchy I was trying to remove, I get similar
> fts_read errors when I `ls` in certain places, but a repeated rm runs
> to completion fine (the tree is gone afterwards).
> 
> There's nothing in dmesg suggesting filesystem corruption, the
> filesystem unmounts and remounts cleanly, I'm running a forced fsck
> now which says "** File system is already clean".  It's a rather large
> filesystem with many inodes in use, so it'll take some time to
> complete.  Also, it's on a softraid crypto device, if that matters:
> 
> sd2: 5231654MB, 512 bytes/sector, 10714427745 sectors
> 
> Reading fts_read(3) wasn't really enlightening as to why a directory
> that's supposedly there, wouldn't be there anymore.  (note that I
> wasn't running another rm in the same tree in parallel when I got
> these errors - I did try to force the error by doing just that, but
> that went through without a single error).
> 
> Could there be some TOCTOU issue here somewhere?  Or some cache
> misbehaviour?  Or is it really dying hardware?

My first bet would be some form of corruption. FLipped bits in e..g
directories while operating normally cannot be seen by the
clean/unclean flag in the superblock. That one only records if the
filesystem was unmounted before reboot, shutdown or crash.

The forced fsck might reveal more.

-Otto


> 
> Paul 'WEiRD' de Weerd
> 
> OpenBSD 6.8-current (GENERIC.MP) #267: Sat Jan  9 19:23:55 MST 2021
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 34311208960 (32721MB)
> avail mem = 33256046592 (31715MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe6690 (57 entries)
> bios0: vendor Dell Inc. version "2.10.0" date 05/24/2018
> bios0: Dell Inc. PowerEdge R210 II
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S4 S5
> acpi0: tables DSDT FACP SPMI DMAR ASF! HPET APIC MCFG BOOT SSDT ASPT SSDT 
> SSDT SPCR HEST ERST BERT EINJ
> acpi0: wakeup devices P0P1(S4) GLAN(S0) EHC1(S4) EHC2(S4) XHC_(S4) RP01(S5) 
> PXSX(S4) RP02(S5) PXSX(S4) RP03(S5) PXSX(S4) RP04(S5) PXSX(S4) RP05(S5) 
> PXSX(S4) RP06(S5) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.91 MHz, 06-2a-07
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.58 MHz, 06-2a-07
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.58 MHz, 06-2a-07
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.58 MHz, 06-2a-07
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SS

rm: fts_read: No such file or directory

2021-01-13 Thread Paul de Weerd
Hi all,

While doing some clean-up on my backup filesystem (which extensively
uses hardlinks), I came across the error in Subject:

rm: fts_read: No such file or directory

Traversing the hierarchy I was trying to remove, I get similar
fts_read errors when I `ls` in certain places, but a repeated rm runs
to completion fine (the tree is gone afterwards).

There's nothing in dmesg suggesting filesystem corruption, the
filesystem unmounts and remounts cleanly, I'm running a forced fsck
now which says "** File system is already clean".  It's a rather large
filesystem with many inodes in use, so it'll take some time to
complete.  Also, it's on a softraid crypto device, if that matters:

sd2: 5231654MB, 512 bytes/sector, 10714427745 sectors

Reading fts_read(3) wasn't really enlightening as to why a directory
that's supposedly there, wouldn't be there anymore.  (note that I
wasn't running another rm in the same tree in parallel when I got
these errors - I did try to force the error by doing just that, but
that went through without a single error).

Could there be some TOCTOU issue here somewhere?  Or some cache
misbehaviour?  Or is it really dying hardware?

Paul 'WEiRD' de Weerd

OpenBSD 6.8-current (GENERIC.MP) #267: Sat Jan  9 19:23:55 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34311208960 (32721MB)
avail mem = 33256046592 (31715MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe6690 (57 entries)
bios0: vendor Dell Inc. version "2.10.0" date 05/24/2018
bios0: Dell Inc. PowerEdge R210 II
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP SPMI DMAR ASF! HPET APIC MCFG BOOT SSDT ASPT SSDT SSDT 
SPCR HEST ERST BERT EINJ
acpi0: wakeup devices P0P1(S4) GLAN(S0) EHC1(S4) EHC2(S4) XHC_(S4) RP01(S5) 
PXSX(S4) RP02(S5) PXSX(S4) RP03(S5) PXSX(S4) RP04(S5) PXSX(S4) RP05(S5) 
PXSX(S4) RP06(S5) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.91 MHz, 06-2a-07
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.58 MHz, 06-2a-07
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.58 MHz, 06-2a-07
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.58 MHz, 06-2a-07
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz, 2394.58 MHz, 06-2a-07
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu4: 256KB 64b/line 8-way L2 cache
cpu4: smt 0, core 2, package 0
cpu5 at